# Fixing JSON Output from GPT: A Pattern That Actually Works

> Source: <https://dev.to/__c1b9e06dc90a7e0a676b/fixing-json-output-from-gpt-a-pattern-that-actually-works-284g>
> Published: 2026-06-07 01:01:53+00:00

I spent three days debugging why my GPT-4-powered app kept returning malformed JSON. It wasn't a prompt issue. I tried few-shot examples, system messages, even begged the model with 'PLEASE give me valid JSON'. And it still broke in production.

This is the story of how I finally got reliable structured output from LLMs — without playing whack-a-mole with edge cases.

I was building a small internal tool that extracts meeting notes and turns them into structured data: action items, dates, assignees. The prompt was crystal clear:

```
Return a JSON array of objects with fields: action, due_date (ISO 8601), assignee.
```

I used `response_format: { "type": "json_object" }`

in the OpenAI API call. In my local tests, everything worked fine. Then the first real user uploaded a transcript with a date like "next Thursday" and the model returned:

```
{"action": "Review Q3 budget", "due_date": "next Thursday", "assignee": "Alice"}
```

Not even a real date. And sometimes I got extra fields, or nested objects, or the entire array wrapped in an extra object. Pure chaos.

I kept rewriting the system prompt. I added:

Did it help? Marginally. But a single edge-case transcript could still trigger a different output format. Model behavior on non-English text was even worse.

I wrote Python code to capture anything between the first `[`

and last `]`

. Then I tried `json.loads()`

inside a try-except. When it failed, I logged the raw output and manually fixed it. This was a terrible idea — I was basically patching a broken pipe.

I built a parser that looked for key-value pairs even without proper JSON delimiters. It worked for 80% of cases, but the edge cases multiplied. And every parser change required a full regression test cycle.

The breakthrough came when I stopped trying to fix the output after generation and instead forced the model to generate valid JSON during generation. This technique is called constrained decoding (or structured generation).

Here's the core idea: instead of allowing the model to pick any token, we restrict the allowed next tokens based on a JSON schema. The model can only produce tokens that will result in valid JSON according to that schema.

I used a Python library called [Outlines](https://github.com/outlines-dev/outlines) (open source, works with OpenAI and local models). The pattern looks like this:

``` python
import outlines
from outlines.generate import json as json_generator
from pydantic import BaseModel
from typing import List

class ActionItem(BaseModel):
    action: str
    due_date: str
    assignee: str

class MeetingNotes(BaseModel):
    items: List[ActionItem]

# The model object (can be a local model or an OpenAI-compatible endpoint)
model = outlines.models.openai("gpt-4o")

generator = json_generator(
    model,
    MeetingNotes,
    # Additional prompt context
    system_prompt="Extract action items from the meeting transcript. Output structured data."
)

result = generator("Meeting: Alice will review Q3 budget by next Thursday. Bob to update the dashboard by Friday.")
print(result.model_dump_json(indent=2))
```

The output is ALWAYS valid JSON matching `MeetingNotes`

. No more parsing hell.

Constrained decoding works by compiling the JSON schema into a finite state machine. At each generation step, the library masks out tokens that would break the schema. For example, after generating a colon after a field name, the only allowed next token is a quote (for a string) or a digit (for a number), depending on the schema type. This is light-years ahead of post-generation validation.

`anyOf`

or recursive definitions can make the state machine huge. I recommend keeping your JSON schema flat for reliability.`text`

(not `json_object`

) and pass the schema as part of the generation pipeline. Some libraries handle this automatically.`json_object`

modeI'd start with constrained decoding from day one. The library I used (Outlines) is one option; there's also Guidance by Microsoft and lm-format-enforcer. If I were using a local model, I'd use the `.generate()`

method with `logits_processor`

. For cloud APIs that don't expose logit bias, I'd batch requests with a retry mechanism that includes a schema-aware error message.

Also, I should have invested more time in schema design earlier. A schema with optional fields and `null`

values is much more robust than one that expects every field to be present.

I still use prompt engineering for creative tasks, but for any application where the output feeds into a database or automation pipeline, constrained decoding is the only sane choice.

My final setup looks like this: Pydantic models for the schema, Outlines for generation, and a simple FastAPI endpoint. I've reduced JSON errors from 25% to less than 0.1%. And the remaining errors are almost always because of a network timeout, not the model.

What's your setup for getting structured output from LLMs? I'd love to hear how others handle this — especially if you're working with APIs that don't support logit bias.
