Fixing JSON Output from GPT: A Pattern That Actually Works A developer spent three days debugging malformed JSON output from GPT-4 before discovering that constrained decoding, rather than prompt engineering or post-generation parsing, reliably produces valid structured data. Using the Outlines library, the engineer forced the model to generate only tokens that conform to a predefined JSON schema, eliminating edge cases where the model returned invalid dates, extra fields, or nested objects. The technique works by compiling the schema into a finite state machine that masks disallowed tokens during generation, producing valid JSON every time without requiring post-hoc validation or manual fixes. I spent three days debugging why my GPT-4-powered app kept returning malformed JSON. It wasn't a prompt issue. I tried few-shot examples, system messages, even begged the model with 'PLEASE give me valid JSON'. And it still broke in production. This is the story of how I finally got reliable structured output from LLMs — without playing whack-a-mole with edge cases. I was building a small internal tool that extracts meeting notes and turns them into structured data: action items, dates, assignees. The prompt was crystal clear: Return a JSON array of objects with fields: action, due date ISO 8601 , assignee. I used response format: { "type": "json object" } in the OpenAI API call. In my local tests, everything worked fine. Then the first real user uploaded a transcript with a date like "next Thursday" and the model returned: {"action": "Review Q3 budget", "due date": "next Thursday", "assignee": "Alice"} Not even a real date. And sometimes I got extra fields, or nested objects, or the entire array wrapped in an extra object. Pure chaos. I kept rewriting the system prompt. I added: Did it help? Marginally. But a single edge-case transcript could still trigger a different output format. Model behavior on non-English text was even worse. I wrote Python code to capture anything between the first and last . Then I tried json.loads inside a try-except. When it failed, I logged the raw output and manually fixed it. This was a terrible idea — I was basically patching a broken pipe. I built a parser that looked for key-value pairs even without proper JSON delimiters. It worked for 80% of cases, but the edge cases multiplied. And every parser change required a full regression test cycle. The breakthrough came when I stopped trying to fix the output after generation and instead forced the model to generate valid JSON during generation. This technique is called constrained decoding or structured generation . Here's the core idea: instead of allowing the model to pick any token, we restrict the allowed next tokens based on a JSON schema. The model can only produce tokens that will result in valid JSON according to that schema. I used a Python library called Outlines https://github.com/outlines-dev/outlines open source, works with OpenAI and local models . The pattern looks like this: python import outlines from outlines.generate import json as json generator from pydantic import BaseModel from typing import List class ActionItem BaseModel : action: str due date: str assignee: str class MeetingNotes BaseModel : items: List ActionItem The model object can be a local model or an OpenAI-compatible endpoint model = outlines.models.openai "gpt-4o" generator = json generator model, MeetingNotes, Additional prompt context system prompt="Extract action items from the meeting transcript. Output structured data." result = generator "Meeting: Alice will review Q3 budget by next Thursday. Bob to update the dashboard by Friday." print result.model dump json indent=2 The output is ALWAYS valid JSON matching MeetingNotes . No more parsing hell. Constrained decoding works by compiling the JSON schema into a finite state machine. At each generation step, the library masks out tokens that would break the schema. For example, after generating a colon after a field name, the only allowed next token is a quote for a string or a digit for a number , depending on the schema type. This is light-years ahead of post-generation validation. anyOf or recursive definitions can make the state machine huge. I recommend keeping your JSON schema flat for reliability. text not json object and pass the schema as part of the generation pipeline. Some libraries handle this automatically. json object modeI'd start with constrained decoding from day one. The library I used Outlines is one option; there's also Guidance by Microsoft and lm-format-enforcer. If I were using a local model, I'd use the .generate method with logits processor . For cloud APIs that don't expose logit bias, I'd batch requests with a retry mechanism that includes a schema-aware error message. Also, I should have invested more time in schema design earlier. A schema with optional fields and null values is much more robust than one that expects every field to be present. I still use prompt engineering for creative tasks, but for any application where the output feeds into a database or automation pipeline, constrained decoding is the only sane choice. My final setup looks like this: Pydantic models for the schema, Outlines for generation, and a simple FastAPI endpoint. I've reduced JSON errors from 25% to less than 0.1%. And the remaining errors are almost always because of a network timeout, not the model. What's your setup for getting structured output from LLMs? I'd love to hear how others handle this — especially if you're working with APIs that don't support logit bias.