I built a production ML inference API with FastAPI, Celery and Docker — here's the full architecture

A developer built a production ML inference API using FastAPI, Celery, and Docker. The architecture uses FastAPI for async HTTP handling, Celery for background task processing, and Redis for queue and result storage. The project includes a testing strategy with in-memory Celery eager mode to avoid Redis dependency during tests.

Para 1 — The problem "Most ML tutorials end at model.fit . Getting a model into production is a completely different skill. Here's how I built a real async inference microservice." Para 2 — Architecture diagram Paste the ASCII diagram from your ARCHITECTURE.md Para 3 — The three components FastAPI handles HTTP why async matters Celery handles background work why not just threads Redis handles both queue and results why one service Para 4 — Key code snippet predict async endpoint Show 15 lines of code — the async endpoint that dispatches to Celery and returns task id immediately Para 5 — Testing strategy "I used in-memory Celery eager mode so tests run without Redis. Here's the conftest pattern." Show 10 lines of conftest.py Para 6 — The result Screenshot of the UI dashboard Screenshot of 47 tests passing Closing line: "If you want the full source code with Docker, CI pipeline, Postman collection and deployment guide, I packaged it here: Gumroad link "