cd /news/ai-tools/i-scanned-langfuse-it-observes-its-o… Β· home β€Ί topics β€Ί ai-tools β€Ί article
[ARTICLE Β· art-16622] src=dev.to pub= topic=ai-tools verified=true sentiment=Β· neutral

I scanned Langfuse. It observes its own LLM calls through its own platform.

Langfuse, an open-source LLM observability platform, uses its own product to trace its internal large language model calls. A code scan revealed that the platform's internal LangChain calls flow through the same `processEventBatch` ingestion pipeline as customer traces, creating a self-observing system. The architecture includes a web app and a worker with 24 queue processors, and ships a Model Context Protocol server for prompt management.

read4 min publishedMay 28, 2026

Post 3 of "Scanning Open Source." So far: Dub hides a fraud engine. Inbox Zero has prompt injection defense. The pattern: every project is architecturally bigger than its tagline.

Today: Langfuse β€” open source LLM observability platform. YC W23. 8K+ stars.

$ npx anatomia-cli scan .

langfuse                                                  web-app
TypeScript Β· Next.js Β· Prisma β†’ PostgreSQL (65 models) Β· 7 packages

Stack
─────
Language     TypeScript
Framework    Next.js
Database     Prisma β†’ PostgreSQL (65 models)
Auth         NextAuth
AI           LangChain
Payments     Stripe
Testing      Vitest, Playwright, Testing Library
UI           shadcn/ui (Tailwind)
Services     AWS S3 Β· Nodemailer Β· Sentry Β· PostHog Β· tRPC (+6 more)
Deploy       Docker Β· GitHub Actions
Workspace    Turborepo (pnpm)

Surfaces
────────
web      Next.js Β· Vitest
worker   TypeScript Β· Vitest

⚠ ~75 of 93 API route files may lack input validation

5 seconds. Two surfaces β€” a web app and a worker. The validation warning is worth context: Langfuse uses tRPC extensively, where validation happens via .input()

schemas in the router layer β€” the scanner checks file-level imports and may not detect middleware-based validation. Here's what I found when I pulled threads.

This is the finding that made me stop and reread the code.

Langfuse uses LangChain internally to power features like the playground (where users test prompts against different models) and LLM-as-judge evaluations. The scan detected AI: LangChain

β€” but the interesting part isn't that they use LangChain. It's HOW they trace those calls.

In getInternalTracingHandler.ts

, Langfuse creates a callback handler using langfuse-langchain

β€” their own open source LangChain integration package. Every internal LLM call flows through processEventBatch

, the same ingestion pipeline that handles customer traces. The observability tool is observing itself.

This isn't debugging. It's architectural dogfooding. The team's own LLM usage generates production traces through the same pipeline their customers use. If the tracing breaks, they'd notice on their own dashboard before any customer reports it.

The scan detected LangChain as the AI SDK. When I traced the imports in fetchLLMCompletion.ts

, six providers are wired up:

import { ChatAnthropic } from "@langchain/anthropic";
import { ChatVertexAI } from "@langchain/google-vertexai";
import { ChatBedrockConverse } from "@langchain/aws";
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { ChatOpenAI, AzureChatOpenAI } from "@langchain/openai";

Anthropic, Google Vertex, AWS Bedrock, Google Generative AI, OpenAI, and Azure OpenAI β€” all through LangChain as a unified interface. This powers the playground where users can test prompts across different models and the evaluation system where LLMs judge other LLMs' outputs.

The scan detected two surfaces: web

and worker

. The worker has 253 source files and 24 separate queue processors β€” ingestion, evaluations, experiments, batch exports, data retention, integrations (PostHog, Mixpanel), OpenTelemetry ingestion, and more. Langfuse processes traces asynchronously β€” the web app accepts data, the worker processes, aggregates, evaluates, and routes it. The separation means trace ingestion never blocks the dashboard.

26 TypeScript files in web/src/features/mcp/

. Langfuse ships a Model Context Protocol server β€” you can manage prompts and query observation data directly from Claude Code or any MCP-compatible tool. Create a prompt, version it, label it, without leaving your editor. If you use Langfuse for prompt management AND Claude Code for development, this closes the loop between the two.

The model count alone isn't the story. It's what the models ARE:

Core tracing: traces, observations, sessions, media attachments

Evaluation: eval templates, job configurations, job executions, score configs

Human review: annotation queues, queue items, queue assignments

Prompt management: prompts, prompt dependencies, protected labels, LLM schemas, LLM tools

Automation: automations, triggers, actions, automation executions, monitors

Integrations: PostHog, Mixpanel, Slack, blob storage β€” each with its own model

The annotation queue system is worth noting. It's a human-in-the-loop review workflow β€” assign traces to reviewers, score them against configurable criteria, track completion. That's the bridge between "the AI said this" and "a human confirmed this was correct." Most observability tools stop at dashboards. Langfuse has a structured process for human judgment on AI output.

The self-tracing pattern is the thread that ties everything together. Langfuse runs LLM calls for the playground and evaluations. Those calls flow through their own ingestion pipeline, processed by their own worker queues, visible on their own dashboard. If you're evaluating Langfuse as an observability platform, the fact that they trust their own product with their own AI workload is the strongest signal in the codebase.

The annotation queue system is the second finding worth noting β€” a human-in-the-loop review workflow where you assign traces to reviewers, score them against configurable criteria, and track completion. Most observability tools stop at dashboards. Langfuse has structured the bridge between "the AI said this" and "a human confirmed this was correct."

Post 3 of "Scanning Open Source." Tomorrow: Formbricks.

npx anatomia-cli scan .

β€” GitHub

── more in #ai-tools 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/i-scanned-langfuse-i…] indexed:0 read:4min 2026-05-28 Β· β€”