{"slug": "structured-output-from-local-llms-json-that-never-breaks-ollama-zod", "title": "Structured Output From Local LLMs: JSON That Never Breaks (Ollama + Zod)", "summary": "A developer created a TypeScript helper, generateStructured<T>(schema), that combines Ollama's native JSON modes with Zod validation to produce reliable structured output from local LLMs. The approach uses Ollama's format parameter with JSON Schema derived from Zod schemas via zod-to-json-schema, reducing parse failures on small models. The helper also includes retry loops and repair logic for truncated output, addressing common issues like missing closing braces or markdown fences.", "body_md": "A 1.5B model running on your laptop will return JSON that almost parses. The closing brace is missing. A trailing comma sneaks in. The whole thing is wrapped in a markdown fence with a chirpy \"Sure! Here's your JSON:\" on top. Cloud models do this too, but small local models do it constantly, and that is exactly where most \"just prompt it harder\" advice falls apart.\n\nI wrote about [validating LLM responses with Zod](https://pavelespitia.hashnode.dev) before: schemas as contracts, `safeParse`\n\n, extracting JSON from chaos. That post is the foundation. This one is the local-model-specific layer on top: Ollama's native JSON modes, retry loops that actually converge, repairing truncated output, and a single `generateStructured<T>(schema)`\n\nhelper that ties it all together so you never hand-roll this again.\n\nOllama gives you two ways to force structure before you ever touch Zod. Use them. They cut your parse-failure rate dramatically.\n\nThe first is `format: \"json\"`\n\n. It constrains decoding so the model can only emit syntactically valid JSON. No markdown fences, no preamble, no trailing prose.\n\n``` js\nconst res = await fetch(\"http://localhost:11434/api/chat\", {\n  method: \"POST\",\n  headers: { \"Content-Type\": \"application/json\" },\n  body: JSON.stringify({\n    model: \"qwen2.5-coder:7b\",\n    messages: [\n      { role: \"system\", content: \"Output a JSON object describing the code.\" },\n      { role: \"user\", content: \"function add(a, b) { return a + b }\" },\n    ],\n    format: \"json\",\n    stream: false,\n    options: { temperature: 0 },\n  }),\n});\n\nconst data = await res.json();\nconst obj = JSON.parse(data.message.content); // already clean JSON\n```\n\n`format: \"json\"`\n\nguarantees valid syntax. It does not guarantee your shape. The model can still invent fields or skip required ones. That is what Zod is for.\n\nThe second mode is the one people miss: pass a full JSON Schema to `format`\n\n. Ollama then constrains generation to match the schema's structure, not just \"valid JSON.\"\n\n``` js\nconst schema = {\n  type: \"object\",\n  properties: {\n    language: { type: \"string\" },\n    purpose: { type: \"string\" },\n    isPure: { type: \"boolean\" },\n  },\n  required: [\"language\", \"purpose\", \"isPure\"],\n};\n\nconst res = await fetch(\"http://localhost:11434/api/chat\", {\n  method: \"POST\",\n  headers: { \"Content-Type\": \"application/json\" },\n  body: JSON.stringify({\n    model: \"qwen2.5-coder:7b\",\n    messages: [{ role: \"user\", content: \"Describe: function add(a,b){return a+b}\" }],\n    format: schema,\n    stream: false,\n    options: { temperature: 0 },\n  }),\n});\n```\n\nYou do not want to write JSON Schema by hand and keep it in sync with your TypeScript types. You already have a Zod schema. Convert it.\n\n``` js\nimport { z } from \"zod\";\nimport { zodToJsonSchema } from \"zod-to-json-schema\";\n\nconst CodeInfo = z.object({\n  language: z.string(),\n  purpose: z.string(),\n  isPure: z.boolean(),\n});\n\nconst jsonSchema = zodToJsonSchema(CodeInfo, { target: \"openApi3\" });\n// pass jsonSchema as `format` to Ollama\n```\n\nOne source of truth. Zod drives both the generation constraint and the runtime validation.\n\n| Mode | Forces valid syntax | Forces your shape | Cost |\n|---|---|---|---|\n| Plain prompt | No | No | Free, unreliable |\n`format: \"json\"` |\nYes | No | Negligible |\n`format: <schema>` |\nYes | Mostly | Slower decode, fewest retries |\n\nOn small models, `format: <schema>`\n\nis worth the slightly slower decode because it turns most three-attempt loops into one.\n\nSchema-constrained decoding still breaks in one nasty way: the model hits its token limit mid-object. You get `{\"vulnerabilities\": [{\"id\": \"V1\", \"severity\": \"hi`\n\nand a dead parse.\n\nTwo defenses. First, raise `num_predict`\n\nso the model has room to finish.\n\n```\noptions: { temperature: 0, num_predict: 2048 }\n```\n\nSecond, attempt a repair before you give up. The common failure is unclosed brackets and a dangling partial value. You can salvage a surprising amount by trimming to the last complete token and closing what is open.\n\n``` js\nfunction repairJson(raw: string): string {\n  let text = raw.trim();\n\n  // Drop a trailing partial string/number/key after the last comma or brace\n  const lastSafe = Math.max(text.lastIndexOf(\"}\"), text.lastIndexOf(\"]\"));\n  if (lastSafe !== -1) text = text.slice(0, lastSafe + 1);\n\n  // Walk the string, tracking open brackets outside of string literals\n  const stack: string[] = [];\n  let inString = false;\n  let escaped = false;\n  for (const ch of text) {\n    if (escaped) { escaped = false; continue; }\n    if (ch === \"\\\\\") { escaped = true; continue; }\n    if (ch === '\"') inString = !inString;\n    if (inString) continue;\n    if (ch === \"{\") stack.push(\"}\");\n    else if (ch === \"[\") stack.push(\"]\");\n    else if (ch === \"}\" || ch === \"]\") stack.pop();\n  }\n\n  // Close whatever is still open, innermost first\n  while (stack.length) text += stack.pop();\n  return text;\n}\n```\n\nThis is a last resort, not a primary strategy. If you trim a truncated array, you lose the cut-off element, which is fine for \"best effort\" reads and wrong for anything that must be complete. I gate it: try the raw parse, then the repaired parse, and if the repaired version loses data the schema requires, Zod rejects it and the retry loop takes over.\n\nThe naive retry resends the same prompt and prays. It does not converge because nothing changed. The version that works feeds the specific Zod error back into the next attempt, the same idea from the earlier post but tuned for local models: lower the temperature on retries and tighten the instruction.\n\n``` js\nasync function withRetry<T>(\n  attempt: (feedback: string | null) => Promise<string>,\n  parse: (raw: string) => z.SafeParseReturnType<unknown, T>,\n  maxAttempts = 3,\n): Promise<T> {\n  let feedback: string | null = null;\n\n  for (let i = 1; i <= maxAttempts; i++) {\n    const raw = await attempt(feedback);\n    const parsed = parse(raw);\n    if (parsed.success) return parsed.data;\n\n    feedback = parsed.error.issues\n      .map((issue) => `${issue.path.join(\".\") || \"<root>\"}: ${issue.message}`)\n      .join(\"\\n\");\n  }\n\n  throw new Error(`No valid output after ${maxAttempts} attempts:\\n${feedback}`);\n}\n```\n\nThe discipline is: the model never sees a generic \"that was wrong.\" It sees `vulnerabilities.0.severity: Invalid enum value. Expected 'high', received 'High'`\n\n. Small models self-correct from that. They cannot self-correct from silence.\n\nYou want streaming for UX (tokens appearing live) but you cannot parse JSON until it is complete. Resolve the tension by streaming for display and accumulating for parsing. Do not try to parse each chunk.\n\n``` js\nasync function streamAccumulate(\n  body: object,\n  onToken?: (t: string) => void,\n): Promise<string> {\n  const res = await fetch(\"http://localhost:11434/api/chat\", {\n    method: \"POST\",\n    headers: { \"Content-Type\": \"application/json\" },\n    body: JSON.stringify({ ...body, stream: true }),\n  });\n  if (!res.body) throw new Error(\"No response body from Ollama\");\n\n  const reader = res.body.getReader();\n  const decoder = new TextDecoder();\n  let full = \"\";\n  let buffer = \"\";\n\n  while (true) {\n    const { done, value } = await reader.read();\n    if (done) break;\n    buffer += decoder.decode(value, { stream: true });\n\n    // Ollama streams newline-delimited JSON objects, one per chunk\n    const lines = buffer.split(\"\\n\");\n    buffer = lines.pop() ?? \"\";\n    for (const line of lines) {\n      if (!line.trim()) continue;\n      const token = JSON.parse(line).message?.content ?? \"\";\n      full += token;\n      onToken?.(token);\n    }\n  }\n\n  return full;\n}\n```\n\nThe trap people fall into: Ollama's `/api/chat`\n\nstream is newline-delimited JSON, one envelope per line, and a single network chunk can split a line in half. That is why `buffer`\n\nkeeps the trailing partial line and only parses complete ones. Parse the accumulated `full`\n\nonce the stream ends. Never on a partial.\n\nHere is the piece I actually reuse. One generic function: pass a Zod schema, get back a typed, validated object, with schema-constrained generation, repair, and retry all handled.\n\n``` js\nimport { z } from \"zod\";\nimport { zodToJsonSchema } from \"zod-to-json-schema\";\n\ninterface StructuredOptions {\n  model?: string;\n  temperature?: number;\n  maxAttempts?: number;\n  numPredict?: number;\n}\n\nexport async function generateStructured<T>(\n  schema: z.ZodType<T>,\n  system: string,\n  user: string,\n  opts: StructuredOptions = {},\n): Promise<T> {\n  const {\n    model = \"qwen2.5-coder:7b\",\n    temperature = 0,\n    maxAttempts = 3,\n    numPredict = 2048,\n  } = opts;\n\n  const jsonSchema = zodToJsonSchema(schema, { target: \"openApi3\" });\n\n  const call = async (feedback: string | null): Promise<string> => {\n    const messages = [\n      { role: \"system\", content: system },\n      { role: \"user\", content: user },\n    ];\n    if (feedback) {\n      messages.push({\n        role: \"user\",\n        content: `Your last response failed validation:\\n${feedback}\\nReturn corrected JSON only.`,\n      });\n    }\n\n    const res = await fetch(\"http://localhost:11434/api/chat\", {\n      method: \"POST\",\n      headers: { \"Content-Type\": \"application/json\" },\n      body: JSON.stringify({\n        model,\n        messages,\n        format: jsonSchema,\n        stream: false,\n        options: { temperature, num_predict: numPredict },\n      }),\n    });\n\n    const data = await res.json();\n    return data.message?.content ?? \"\";\n  };\n\n  const parse = (raw: string): z.SafeParseReturnType<unknown, T> => {\n    for (const candidate of [raw, repairJson(raw)]) {\n      try {\n        return schema.safeParse(JSON.parse(candidate));\n      } catch {\n        // not parseable, try the next candidate\n      }\n    }\n    // Force a Zod failure with a useful message\n    return schema.safeParse(undefined);\n  };\n\n  return withRetry(call, parse, maxAttempts);\n}\n```\n\nUsage is the part that makes the abstraction worth it.\n\n``` js\nconst CodeReview = z.object({\n  summary: z.string(),\n  issues: z.array(z.object({\n    severity: z.enum([\"high\", \"medium\", \"low\"]),\n    line: z.number().int().positive(),\n    note: z.string(),\n  })),\n  riskScore: z.coerce.number().min(0).max(100),\n});\n\nconst review = await generateStructured(\n  CodeReview,\n  \"You are a code reviewer. Output JSON only.\",\n  sourceCode,\n  { model: \"ollama-friendly\", maxAttempts: 3 },\n);\n\n// review is fully typed as z.infer<typeof CodeReview>, validated, never undefined\n```\n\n`review.issues[0].severity`\n\nis typed. Your editor autocompletes it. If the model returns `\"High\"`\n\n, the `z.enum`\n\nrejects it, the error flows back into the retry, and the next attempt fixes it. You wrote the schema once.\n\n`format: <schema>`\n\nand Zod are not redundant. The first reduces how often you fail; the second catches what slips through.`zodToJsonSchema`\n\nkeeps the generation constraint and the runtime check from drifting apart.This is the exact pattern [spectr-ai](https://github.com/pavelEspitia/spectr-ai) uses to run a smart-contract audit fully locally with `--model ollama:qwen2.5-coder:1.5b`\n\n. Every byte the model emits passes through `generateStructured`\n\n. The 1.5B model still fumbles the JSON. The user never sees it.", "url": "https://wpnews.pro/news/structured-output-from-local-llms-json-that-never-breaks-ollama-zod", "canonical_source": "https://dev.to/pavelespitia/structured-output-from-local-llms-json-that-never-breaks-ollama-zod-5d09", "published_at": "2026-06-14 14:53:43+00:00", "updated_at": "2026-06-14 15:11:05.986209+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "ai-tools", "machine-learning"], "entities": ["Ollama", "Zod", "zod-to-json-schema", "qwen2.5-coder:7b"], "alternates": {"html": "https://wpnews.pro/news/structured-output-from-local-llms-json-that-never-breaks-ollama-zod", "markdown": "https://wpnews.pro/news/structured-output-from-local-llms-json-that-never-breaks-ollama-zod.md", "text": "https://wpnews.pro/news/structured-output-from-local-llms-json-that-never-breaks-ollama-zod.txt", "jsonld": "https://wpnews.pro/news/structured-output-from-local-llms-json-that-never-breaks-ollama-zod.jsonld"}}