Most AI demos work perfectly on a laptop.
But production AI systems can become fragile when everything is handled inside one synchronous API call.
A user sends a request.
The API extracts text.
The API chunks the content.
The API generates embeddings.
The API stores data.
The API waits for everything to finish.
This may look simple in a demo, but it quickly becomes a problem in real systems.
The problem with one giant API call
In many AI applications, the API is expected to do too much.
For example, in a document processing or RAG pipeline, one request may trigger multiple heavy steps: text extraction
chunking
embedding generation
indexing
summarization
database updates
If all of this happens inside one synchronous request, the API becomes slow and fragile.
If one downstream step fails, the complete request may fail.
If traffic increases suddenly, the API may become overloaded.
This is why event-driven architecture becomes useful for AI workloads.
A better approach: API + Kafka + workers
Instead of making the API do everything, we can split the workflow into smaller services.
The API accepts the request and publishes an event.
Background workers consume events and continue the processing asynchronously.
A simple flow looks like this:
User Request
↓
FastAPI
↓
Kafka / Redpanda Topic
↓
Python Worker
↓
Next Processing Stage
In my practical demo, I am using:
FastAPI
Redpanda
Python workers
Docker Compose
Kafka-compatible messaging
Why Redpanda?
Redpanda is Kafka-compatible, which makes it useful for local demos and event-driven architecture experiments.
It allows us to work with Kafka-style topics, producers, and consumers while keeping the setup simple for development.
What this architecture gives us
This approach helps with:
decoupling services
handling bursty workloads
moving long-running tasks to background workers
improving scalability
isolating failures
building production-style AI pipelines
This pattern is especially useful for AI systems involving:
document processing
chunking
embeddings
RAG indexing
summarization
long-running background jobs
Key architecture idea
The API should not behave like a worker.
The API should accept the request, publish an event, and return quickly.
Workers should handle the heavy processing in the background.
That separation makes the system easier to scale, debug, and extend.
Video demo
I created a practical video where I build this Kafka-based AI pipeline step by step using FastAPI, Redpanda, Docker Compose, and Python workers.
Watch the video here:
[https://youtu.be/c2ijN2KAWXw](https://youtu.be/c2ijN2KAWXw)
Final thought
AI architecture is not only about calling an LLM.
The real challenge is designing the system around the AI workload.
For many production AI applications, especially those involving document processing, RAG, embeddings, or summarization, event-driven architecture can make the system much more resilient. This is the kind of foundation we need before building more advanced AI pipelines.