{"slug": "i-scanned-langfuse-it-observes-its-own-llm-calls-through-its-own-platform", "title": "I scanned Langfuse. It observes its own LLM calls through its own platform.", "summary": "Langfuse, an open-source LLM observability platform, uses its own product to trace its internal large language model calls. A code scan revealed that the platform's internal LangChain calls flow through the same `processEventBatch` ingestion pipeline as customer traces, creating a self-observing system. The architecture includes a web app and a worker with 24 queue processors, and ships a Model Context Protocol server for prompt management.", "body_md": "Post 3 of \"Scanning Open Source.\" So far: [Dub hides a fraud engine](https://dev.to/link). [Inbox Zero has prompt injection defense](https://dev.to/link). The pattern: every project is architecturally bigger than its tagline.\n\nToday: [Langfuse](https://langfuse.com) — open source LLM observability platform. YC W23. 8K+ stars.\n\n``` bash\n$ npx anatomia-cli scan .\n\nlangfuse                                                  web-app\nTypeScript · Next.js · Prisma → PostgreSQL (65 models) · 7 packages\n\nStack\n─────\nLanguage     TypeScript\nFramework    Next.js\nDatabase     Prisma → PostgreSQL (65 models)\nAuth         NextAuth\nAI           LangChain\nPayments     Stripe\nTesting      Vitest, Playwright, Testing Library\nUI           shadcn/ui (Tailwind)\nServices     AWS S3 · Nodemailer · Sentry · PostHog · tRPC (+6 more)\nDeploy       Docker · GitHub Actions\nWorkspace    Turborepo (pnpm)\n\nSurfaces\n────────\nweb      Next.js · Vitest\nworker   TypeScript · Vitest\n\n⚠ ~75 of 93 API route files may lack input validation\n```\n\n5 seconds. Two surfaces — a web app and a worker. The validation warning is worth context: Langfuse uses tRPC extensively, where validation happens via `.input()`\n\nschemas in the router layer — the scanner checks file-level imports and may not detect middleware-based validation. Here's what I found when I pulled threads.\n\nThis is the finding that made me stop and reread the code.\n\nLangfuse uses LangChain internally to power features like the playground (where users test prompts against different models) and LLM-as-judge evaluations. The scan detected `AI: LangChain`\n\n— but the interesting part isn't that they use LangChain. It's HOW they trace those calls.\n\nIn `getInternalTracingHandler.ts`\n\n, Langfuse creates a callback handler using `langfuse-langchain`\n\n— their own open source LangChain integration package. Every internal LLM call flows through `processEventBatch`\n\n, the same ingestion pipeline that handles customer traces. The observability tool is observing itself.\n\nThis isn't debugging. It's architectural dogfooding. The team's own LLM usage generates production traces through the same pipeline their customers use. If the tracing breaks, they'd notice on their own dashboard before any customer reports it.\n\nThe scan detected LangChain as the AI SDK. When I traced the imports in `fetchLLMCompletion.ts`\n\n, six providers are wired up:\n\n``` js\nimport { ChatAnthropic } from \"@langchain/anthropic\";\nimport { ChatVertexAI } from \"@langchain/google-vertexai\";\nimport { ChatBedrockConverse } from \"@langchain/aws\";\nimport { ChatGoogleGenerativeAI } from \"@langchain/google-genai\";\nimport { ChatOpenAI, AzureChatOpenAI } from \"@langchain/openai\";\n```\n\nAnthropic, Google Vertex, AWS Bedrock, Google Generative AI, OpenAI, and Azure OpenAI — all through LangChain as a unified interface. This powers the playground where users can test prompts across different models and the evaluation system where LLMs judge other LLMs' outputs.\n\nThe scan detected two surfaces: `web`\n\nand `worker`\n\n. The worker has 253 source files and 24 separate queue processors — ingestion, evaluations, experiments, batch exports, data retention, integrations (PostHog, Mixpanel), OpenTelemetry ingestion, and more. Langfuse processes traces asynchronously — the web app accepts data, the worker processes, aggregates, evaluates, and routes it. The separation means trace ingestion never blocks the dashboard.\n\n26 TypeScript files in `web/src/features/mcp/`\n\n. Langfuse ships a Model Context Protocol server — you can manage prompts and query observation data directly from Claude Code or any MCP-compatible tool. Create a prompt, version it, label it, without leaving your editor. If you use Langfuse for prompt management AND Claude Code for development, this closes the loop between the two.\n\nThe model count alone isn't the story. It's what the models ARE:\n\n**Core tracing:** traces, observations, sessions, media attachments\n\n**Evaluation:** eval templates, job configurations, job executions, score configs\n\n**Human review:** annotation queues, queue items, queue assignments\n\n**Prompt management:** prompts, prompt dependencies, protected labels, LLM schemas, LLM tools\n\n**Automation:** automations, triggers, actions, automation executions, monitors\n\n**Integrations:** PostHog, Mixpanel, Slack, blob storage — each with its own model\n\nThe annotation queue system is worth noting. It's a human-in-the-loop review workflow — assign traces to reviewers, score them against configurable criteria, track completion. That's the bridge between \"the AI said this\" and \"a human confirmed this was correct.\" Most observability tools stop at dashboards. Langfuse has a structured process for human judgment on AI output.\n\nThe self-tracing pattern is the thread that ties everything together. Langfuse runs LLM calls for the playground and evaluations. Those calls flow through their own ingestion pipeline, processed by their own worker queues, visible on their own dashboard. If you're evaluating Langfuse as an observability platform, the fact that they trust their own product with their own AI workload is the strongest signal in the codebase.\n\nThe annotation queue system is the second finding worth noting — a human-in-the-loop review workflow where you assign traces to reviewers, score them against configurable criteria, and track completion. Most observability tools stop at dashboards. Langfuse has structured the bridge between \"the AI said this\" and \"a human confirmed this was correct.\"\n\n*Post 3 of \"Scanning Open Source.\" Tomorrow: Formbricks.*\n\n`npx anatomia-cli scan .`\n\n— [GitHub](https://github.com/anatomia-dev/anatomia)", "url": "https://wpnews.pro/news/i-scanned-langfuse-it-observes-its-own-llm-calls-through-its-own-platform", "canonical_source": "https://dev.to/ryan_patrick_smith/i-scanned-langfuse-it-observes-its-own-llm-calls-through-its-own-platform-5577", "published_at": "2026-05-28 16:15:00+00:00", "updated_at": "2026-05-28 16:25:40.222998+00:00", "lang": "en", "topics": ["ai-tools", "ai-infrastructure", "mlops", "large-language-models", "ai-startups"], "entities": ["Langfuse", "LangChain", "Next.js", "Prisma", "PostgreSQL", "NextAuth", "Stripe", "tRPC"], "alternates": {"html": "https://wpnews.pro/news/i-scanned-langfuse-it-observes-its-own-llm-calls-through-its-own-platform", "markdown": "https://wpnews.pro/news/i-scanned-langfuse-it-observes-its-own-llm-calls-through-its-own-platform.md", "text": "https://wpnews.pro/news/i-scanned-langfuse-it-observes-its-own-llm-calls-through-its-own-platform.txt", "jsonld": "https://wpnews.pro/news/i-scanned-langfuse-it-observes-its-own-llm-calls-through-its-own-platform.jsonld"}}