{"slug": "never-trust-an-llm-s-output-directly-here-s-the-validation-layer-i-put-on-every", "title": "Never trust an LLM's output directly. Here's the validation layer I put on every agent.", "summary": "A developer built a validation layer for LLM agent outputs that catches structural and semantic errors before code acts on the data. The three-stage pipeline—parse, validate, classify—uses Zod schemas to enforce type and semantic constraints, returning a discriminated union that forces callers to handle failure paths. The approach addresses common failure modes where models emit valid JSON that is structurally or semantically incorrect.", "body_md": "Here's a failure mode I've seen in nearly every AI agent codebase I've reviewed: the agent receives a model response, trusts the JSON it contains, and calls `.result.items[0].id`\n\n— which throws `Cannot read properties of undefined`\n\nat 2 AM because the model returned `{\"result\": null}`\n\non an edge case.\n\nThe model didn't hallucinate the content. It hallucinated the *structure*.\n\nThis is surprisingly common, and the fix isn't \"use a better prompt.\" The fix is a validation layer that runs between the raw model output and the code that acts on it.\n\nClaude and GPT-4 both support structured output modes that constrain the model to emit valid JSON matching a given schema. This is genuinely useful and you should use it. But it doesn't fully solve the problem, for two reasons:\n\n**1. JSON-valid is not semantically valid.**\n\nThe model can emit perfectly valid JSON that conforms to your schema and still be wrong. A string field that should be a UUID might contain a made-up identifier that fails a database lookup. An integer field labeled `confidence_score`\n\nmight be 847 when your code expects a 0-1 float. The schema enforces types, not semantics.\n\n**2. Not all LLM calls use structured output.**\n\nIf you're doing multi-step reasoning, chain-of-thought steps, tool call parsing, or processing outputs from models that don't support native JSON mode, you're parsing free-text responses. You need to handle that robustly.\n\nEvery agent call I build now goes through three stages:\n\n```\nraw model output\n     ↓\n  [PARSE]   – extract the structure from the text\n     ↓\n [VALIDATE] – assert the structure matches expectations\n     ↓\n [CLASSIFY] – categorize the outcome so the caller can handle it\n```\n\nHere's the TypeScript implementation I actually use:\n\n``` js\nimport { z } from \"zod\";\n\n// 1. Define the schema for what you expect\nconst AnalysisResultSchema = z.object({\n  sentiment: z.enum([\"positive\", \"negative\", \"neutral\"]),\n  confidence: z.number().min(0).max(1),\n  key_points: z.array(z.string()).min(1).max(10),\n  action_required: z.boolean(),\n  follow_up: z.string().optional(),\n});\n\ntype AnalysisResult = z.infer<typeof AnalysisResultSchema>;\n\n// 2. The parse-validate-classify wrapper\ntype AgentOutput<T> =\n  | { ok: true; data: T }\n  | { ok: false; reason: \"parse_failure\" | \"validation_failure\" | \"empty_response\"; raw: string; error?: string };\n\nfunction parseAgentOutput<T>(\n  raw: string,\n  schema: z.ZodSchema<T>\n): AgentOutput<T> {\n  // Guard: empty or whitespace-only response\n  if (!raw.trim()) {\n    return { ok: false, reason: \"empty_response\", raw };\n  }\n\n  // Extract JSON from the response — models often wrap it in prose or code fences\n  const jsonMatch = raw.match(/```\n{% endraw %}\n(?:json)?\\s*([\\s\\S]*?)\n{% raw %}\n```/) || \n                    raw.match(/(\\{[\\s\\S]*\\}|\\[[\\s\\S]*\\])/);\n\n  const jsonString = jsonMatch ? jsonMatch[1] ?? jsonMatch[0] : raw.trim();\n\n  let parsed: unknown;\n  try {\n    parsed = JSON.parse(jsonString);\n  } catch (err) {\n    return {\n      ok: false,\n      reason: \"parse_failure\",\n      raw,\n      error: err instanceof Error ? err.message : \"JSON.parse failed\",\n    };\n  }\n\n  const result = schema.safeParse(parsed);\n  if (!result.success) {\n    return {\n      ok: false,\n      reason: \"validation_failure\",\n      raw,\n      error: result.error.errors.map(e => `${e.path.join(\".\")}: ${e.message}`).join(\"; \"),\n    };\n  }\n\n  return { ok: true, data: result.data };\n}\n```\n\nThe `AgentOutput<T>`\n\ndiscriminated union forces the caller to handle both the happy path and the failure paths. You can't accidentally access `output.data`\n\nwithout first checking `output.ok`\n\n.\n\n``` python\nimport Anthropic from \"@anthropic-ai/sdk\";\n\nconst client = new Anthropic();\n\nasync function analyzeCustomerFeedback(\n  feedback: string\n): Promise<AgentOutput<AnalysisResult>> {\n  const response = await client.messages.create({\n    model: \"claude-sonnet-4-5\",\n    max_tokens: 512,\n    system: `You analyze customer feedback. Always respond with JSON matching this schema exactly:\n{\n  \"sentiment\": \"positive\" | \"negative\" | \"neutral\",\n  \"confidence\": number between 0 and 1,\n  \"key_points\": array of strings (1-10 items),\n  \"action_required\": boolean,\n  \"follow_up\": optional string\n}\nNo prose. No markdown. Just the JSON object.`,\n    messages: [{ role: \"user\", content: feedback }],\n  });\n\n  const rawText = response.content\n    .filter((b): b is Anthropic.TextBlock => b.type === \"text\")\n    .map(b => b.text)\n    .join(\"\");\n\n  return parseAgentOutput(rawText, AnalysisResultSchema);\n}\n\n// Calling code handles both outcomes explicitly\nconst result = await analyzeCustomerFeedback(userFeedback);\n\nif (!result.ok) {\n  // Log the failure with full context for debugging\n  console.error(\"Agent output invalid\", {\n    reason: result.reason,\n    error: result.error,\n    raw: result.raw.slice(0, 500), // don't log huge payloads\n  });\n\n  // Decide what to do: retry, fall back, surface to user, etc.\n  return handleValidationFailure(result.reason);\n}\n\n// TypeScript knows result.data is AnalysisResult here\nconst { sentiment, confidence, key_points } = result.data;\n```\n\nNot all validation failures are permanent. Sometimes the model produces malformed JSON on the first try but gets it right on a retry. The key is distinguishing which failures are worth retrying.\n\n```\nasync function analyzeWithRetry(\n  feedback: string,\n  maxAttempts = 3\n): Promise<AnalysisResult> {\n  let lastError = \"\";\n\n  for (let attempt = 1; attempt <= maxAttempts; attempt++) {\n    const result = await analyzeCustomerFeedback(feedback);\n\n    if (result.ok) return result.data;\n\n    lastError = result.error ?? result.reason;\n\n    // Don't retry empty responses — something else is wrong\n    if (result.reason === \"empty_response\") break;\n\n    // On validation failure, give the model the error as feedback\n    if (attempt < maxAttempts && result.reason === \"validation_failure\") {\n      // Could pass the error back in the next prompt: \"Your last response failed \n      // validation: {lastError}. Try again.\"\n      console.warn(`Attempt ${attempt} failed validation: ${lastError}`);\n      continue;\n    }\n  }\n\n  throw new Error(`Failed after ${maxAttempts} attempts. Last error: ${lastError}`);\n}\n```\n\nThe pattern of feeding the validation error back to the model in the retry prompt is particularly effective. Instead of blindly retrying, you're telling the model what went wrong. In my experience this gets you to a valid output on the second attempt about 80% of the time when the first attempt had a validation failure.\n\nWhen validation fails in production, you need enough information to understand and fix the problem — but not so much that you're logging personally identifiable information or burning storage costs.\n\n```\n// Good: structured, queryable, safe\nconsole.error(JSON.stringify({\n  event: \"agent_validation_failure\",\n  reason: result.reason,\n  error_path: result.error, // which field failed\n  response_length: result.raw.length,\n  response_prefix: result.raw.slice(0, 100), // enough to see the pattern\n  model: \"claude-sonnet-4-5\",\n  timestamp: new Date().toISOString(),\n}));\n```\n\nAfter a week of production logs, you'll see patterns. Maybe the model consistently omits the `confidence`\n\nfield for certain categories of input. Maybe it returns arrays as strings when the input contains newlines. Those patterns tell you where to strengthen your prompt or add extra coercion logic.\n\nIf Zod feels like overkill, here's the minimal version that still catches the most common failures:\n\n``` python\nimport json\nfrom typing import TypedDict\n\nclass AnalysisResult(TypedDict):\n    sentiment: str\n    confidence: float\n    action_required: bool\n\nREQUIRED_KEYS = {\"sentiment\", \"confidence\", \"action_required\"}\nVALID_SENTIMENTS = {\"positive\", \"negative\", \"neutral\"}\n\ndef parse_analysis(raw: str) -> AnalysisResult | None:\n    # Strip code fences if present\n    text = raw.strip()\n    if text.startswith(\"```\n\n\"):\n        text = text.split(\"\n\n```\")[1]\n        if text.startswith(\"json\"):\n            text = text[4:]\n\n    try:\n        data = json.loads(text.strip())\n    except json.JSONDecodeError:\n        return None\n\n    # Check required keys\n    if not REQUIRED_KEYS.issubset(data.keys()):\n        return None\n\n    # Check semantic constraints\n    if data[\"sentiment\"] not in VALID_SENTIMENTS:\n        return None\n    if not (0 <= float(data[\"confidence\"]) <= 1):\n        return None\n\n    return data\n```\n\nNot as composable as Zod, but it catches the common failure modes: missing keys, wrong enum values, out-of-range numbers.\n\nLLMs are probabilistic. They do not guarantee that their structured output will be valid — even when you ask nicely. A production agent needs a deterministic layer that classifies every output as valid or invalid before any code acts on it. Build that layer first, log its failures, and let the failure data tell you where your prompt needs to improve.\n\nThe validation layer doesn't slow you down — it makes your agent debuggable. Without it, you're flying blind.\n\nI cover validation patterns, retry logic, and production reliability in the free **Reliable Agent Field Guide**: [penloomstudio.com/field-guide.html](https://penloomstudio.com/field-guide.html)", "url": "https://wpnews.pro/news/never-trust-an-llm-s-output-directly-here-s-the-validation-layer-i-put-on-every", "canonical_source": "https://dev.to/penloom_studio_829b7817d3/never-trust-an-llms-output-directly-heres-the-validation-layer-i-put-on-every-agent-207c", "published_at": "2026-07-01 02:19:27+00:00", "updated_at": "2026-07-01 02:48:57.826330+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "developer-tools"], "entities": ["Claude", "GPT-4", "Anthropic", "Zod"], "alternates": {"html": "https://wpnews.pro/news/never-trust-an-llm-s-output-directly-here-s-the-validation-layer-i-put-on-every", "markdown": "https://wpnews.pro/news/never-trust-an-llm-s-output-directly-here-s-the-validation-layer-i-put-on-every.md", "text": "https://wpnews.pro/news/never-trust-an-llm-s-output-directly-here-s-the-validation-layer-i-put-on-every.txt", "jsonld": "https://wpnews.pro/news/never-trust-an-llm-s-output-directly-here-s-the-validation-layer-i-put-on-every.jsonld"}}