**The Wrong Approach
Most applications follow this flow:**
User Query
↓
LLM Request
↓
Wait 5-10 Seconds
↓
Return Full Response
**The Better Architecture
Use Spring AI's streaming support combined with Server-Sent Events (SSE).**
User Query
↓
Spring AI
↓
Streaming Tokens
↓
SSE Endpoint
↓
Browser Updates UI Instantly
``
Spring AI Streaming Example
@RestController
@RequiredArgsConstructor
public class ChatController {
private final ChatClient chatClient;
@GetMapping(value = "/chat/stream",
produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamResponse(
@RequestParam String message) {
return chatClient.prompt()
.user(message)
.stream()
.content();
}
}
`
`
Frontend Integration
const eventSource = new EventSource(
"/chat/stream?message=Explain Spring AI"
);
eventSource.onmessage = (event) => {
document.getElementById("output").innerHTML += event.data;
};
Performance Benefits
Faster Perceived Response Time
Even if the model takes 8 seconds to complete:
Without Streaming → First token after 8s
With Streaming → First token after 200-500ms
The total generation time remains the same, but users perceive the application as significantly faster.
Reduced Bounce Rate
Users are less likely to leave while waiting because they can see progress immediately.
Better AI UX
Streaming makes even local Ollama models feel responsive.