Streaming a LangGraph Agent as OpenAI-Compatible SSE (with a Thinking Panel) A developer built an adapter that converts LangGraph agent event streams into OpenAI-compatible Server-Sent Events, enabling tools like Open WebUI to display a 'thinking' panel showing the agent's tool calls in real time. The 90-line solution handles the strict OpenAI chunk format, including role, content, finish reason, and the required [DONE] sentinel, while also wrapping tool activity in tags for a collapsible reasoning display. In Part 1 https://dev.to/javaking1129/running-a-langgraph-react-agent-in-production-openai-compatible-api-multi-model-gateway--emi I built a LangGraph ReAct agent behind an OpenAI-compatible API and waved at one line: return StreamingResponse graph to openai sse graph, inputs, model name, config=config , media type="text/event-stream" That graph to openai sse is where the real work hides. An OpenAI client like Open WebUI doesn't want "a LangGraph run" — it wants a very specific stream of chat.completion.chunk JSON objects over Server-Sent Events, terminated by a DONE sentinel. LangGraph, meanwhile, emits its own rich event stream. This post is the adapter between the two — about 90 lines that also give you a free "thinking" panel showing the agent's tool calls as they happen. What the client expects — each token arrives as an SSE line: data: {json}\n\n , where the JSON is an OpenAI chunk: python app/api/openai compat.py def make chunk delta, model name, completion id, finish reason=None : return { "id": completion id, "chatcmpl-..." "object": "chat.completion.chunk", "created": int time.time , "model": model name, "choices": {"index": 0, "delta": delta, "finish reason": finish reason} , } The stream has a strict shape: delta = {"role": "assistant"} , delta = {"content": "..."} — one per token, finish reason = "stop" , data: DONE \n\n .Miss the DONE and the client spins forever. Skip the role chunk and some clients drop the first token. The contract is small but unforgiving. What LangGraph emits — astream events is a single async stream of typed events for everything happening inside the graph: model tokens, tool calls, node transitions. We subscribe once and translate each event we care about into chunks. python app/api/streaming.py async def graph to openai sse graph, inputs, model name, config=None : completion id = new completion id yield sse make chunk {"role": "assistant"}, model name, completion id 1 role def emit text : return sse make chunk {"content": text}, model name, completion id async for event in graph.astream events inputs, config=config, version="v2" : kind = event.get "event" if kind == "on chat model stream": chunk = event "data" "chunk" if isinstance chunk, AIMessageChunk and isinstance chunk.content, str : yield emit chunk.content 2 tokens yield sse make chunk {}, model name, completion id, finish reason="stop" 3 stop yield b"data: DONE \n\n" 4 done Three things to notice: version="v2" metadata.langgraph node and data.chunk keys don't silently move under you. on chat model stream data.chunk is an AIMessageChunk — but only when the LLM is actually streaming. Guarding with isinstance ... avoids crashing on the non-streaming events that also flow through. completion id for the whole response. sse is just the wire framing — and note ensure ascii=False , which matters the moment your tokens are Korean, Japanese, or emoji: python def sse payload : return f"data: {json.dumps payload, ensure ascii=False }\n\n".encode "utf-8" Streaming the final answer is table stakes. The interesting part of a ReAct agent is what it did before answering — which document it searched, what came back. Open WebUI renders any text wrapped in