{"slug": "structured-outputs-how-we-stopped-parsing-llm-responses-by-hand", "title": "Structured Outputs: How We Stopped Parsing LLM Responses by Hand", "summary": "A developer describes how their team uses OpenAI's structured outputs feature to enforce JSON schema compliance in LLM responses, eliminating parsing failures in production Django applications. By passing Pydantic models directly to the API, they obtain typed data objects instead of raw strings, which they use in a document processing pipeline for contract extraction.", "body_md": "Every team we talk to has a version of the same story. They built an LLM integration that works well in testing. Then, three weeks into production, something comes back slightly different — the model wraps the JSON in a code block, or uses `\"status\": \"Completed\"`\n\ninstead of `\"status\": \"complete\"`\n\n, or includes an extra key that breaks the downstream parser. The whole pipeline falls over.\n\nThis post is about how we handle that problem — specifically, how we use structured outputs to get reliable, typed data from LLMs in production Django applications, and where the approach still has limits.\n\nWhen you ask an LLM to \"return JSON\", it usually does. Until it doesn't.\n\nThe failure modes are predictable once you've seen them enough times:\n\n`json ...`\n\n)`customer_id`\n\nvs `customerId`\n\nvs `customer id`\n\n)None of this is surprising — the model is a text predictor, not a JSON serialiser. Treating its output as reliable structured data requires you to either enforce structure at generation time, or write defensive parsing code that handles every variant. The second path is a maintenance problem that compounds over time.\n\nThe cleaner approach is to constrain what the model can generate. OpenAI's structured outputs feature (available since late 2024) lets you pass a JSON schema to the API, and the model is guaranteed to return output that conforms to it. No code fences, no stray fields, no type mismatches.\n\nWe define our schemas with Pydantic and pass them directly to the API:\n\n``` python\nfrom pydantic import BaseModel\nfrom openai import OpenAI\nfrom typing import Literal\n\nclient = OpenAI()\n\nclass ExtractionResult(BaseModel):\n    company_name: str\n    industry: str\n    annual_revenue_usd: int | None\n    employee_count: int | None\n    confidence: Literal[\"high\", \"medium\", \"low\"]\n    notes: str\n\ndef extract_company_info(raw_text: str) -> ExtractionResult:\n    response = client.beta.chat.completions.parse(\n        model=\"gpt-4o-2024-08-06\",\n        messages=[\n            {\n                \"role\": \"system\",\n                \"content\": (\n                    \"Extract structured company information from the provided text. \"\n                    \"Use null for fields you cannot determine with reasonable confidence.\"\n                ),\n            },\n            {\"role\": \"user\", \"content\": raw_text},\n        ],\n        response_format=ExtractionResult,\n    )\n    return response.choices[0].message.parsed\n```\n\nThe return value is a proper Pydantic model instance. You can access `result.company_name`\n\ndirectly, pass it to a Django serializer, store it in a JSONField — it is typed data, not a string you have to parse.\n\nWe use this pattern in a document processing pipeline where we extract key fields from uploaded contracts and business documents before routing them for human review.\n\n``` python\n# models.py\nfrom django.db import models\n\nclass Document(models.Model):\n    STATUS_CHOICES = [\n        (\"pending\", \"Pending\"),\n        (\"processing\", \"Processing\"),\n        (\"extracted\", \"Extracted\"),\n        (\"failed\", \"Failed\"),\n        (\"needs_review\", \"Needs Review\"),\n    ]\n\n    file = models.FileField(upload_to=\"documents/\")\n    raw_text = models.TextField(blank=True)\n    extracted_data = models.JSONField(null=True, blank=True)\n    extraction_confidence = models.CharField(max_length=10, blank=True)\n    status = models.CharField(max_length=20, choices=STATUS_CHOICES, default=\"pending\")\n    created_at = models.DateTimeField(auto_now_add=True)\n\n# tasks.py (Celery)\nfrom celery import shared_task\nfrom openai import OpenAI\nfrom pydantic import BaseModel, ValidationError\nfrom typing import Literal\nimport logging\n\nlogger = logging.getLogger(__name__)\nclient = OpenAI()\n\nclass ContractExtraction(BaseModel):\n    counterparty_name: str\n    contract_value_usd: int | None\n    start_date: str | None  # ISO 8601\n    end_date: str | None\n    auto_renewal: bool\n    governing_law: str | None\n    confidence: Literal[\"high\", \"medium\", \"low\"]\n\n@shared_task\ndef extract_document_fields(document_id: int):\n    from .models import Document\n\n    doc = Document.objects.get(id=document_id)\n    doc.status = \"processing\"\n    doc.save(update_fields=[\"status\"])\n\n    try:\n        response = client.beta.chat.completions.parse(\n            model=\"gpt-4o-2024-08-06\",\n            messages=[\n                {\n                    \"role\": \"system\",\n                    \"content\": (\n                        \"Extract key fields from this contract. \"\n                        \"Use null for fields not present or unclear. \"\n                        \"Set confidence to 'low' if you are uncertain about any critical field.\"\n                    ),\n                },\n                {\"role\": \"user\", \"content\": doc.raw_text[:8000]},  # Stay within context\n            ],\n            response_format=ContractExtraction,\n        )\n\n        result = response.choices[0].message.parsed\n\n        doc.extracted_data = result.model_dump()\n        doc.extraction_confidence = result.confidence\n        doc.status = \"needs_review\" if result.confidence == \"low\" else \"extracted\"\n\n    except Exception as e:\n        logger.error(f\"Extraction failed for document {document_id}: {e}\")\n        doc.status = \"failed\"\n\n    doc.save()\n```\n\nThe key decision here: low-confidence extractions automatically route to human review. The confidence field is part of the schema — we instruct the model to self-report uncertainty, and we act on it. This is the same principle as our agent designs: the human review path is first-class, not a fallback.\n\nThe one case structured outputs cannot prevent is a model refusal. If the model decides the input violates its content policy, `response.choices[0].message.parsed`\n\nwill be `None`\n\nand `response.choices[0].message.refusal`\n\nwill contain the refusal message.\n\nThis needs explicit handling:\n\n```\nmessage = response.choices[0].message\n\nif message.refusal:\n    logger.warning(f\"Model refused extraction for document {document_id}: {message.refusal}\")\n    doc.status = \"needs_review\"\n    doc.save(update_fields=[\"status\"])\n    return\n\nresult = message.parsed\n```\n\nIn practice, refusals are rare for document extraction tasks. They are more common when you are doing classification or analysis on content that might be flagged — customer support tickets, forum posts, unmoderated user content. If your pipeline processes that kind of input, test refusal handling early.\n\nIf you are using Anthropic's Claude models (which we also use for some tasks), the equivalent mechanism is tool use. You define a tool with a JSON schema, instruct the model to always call it, and get structured output through the tool call rather than the message content.\n\n``` python\nimport anthropic\nimport json\n\nclient = anthropic.Anthropic()\n\nextraction_tool = {\n    \"name\": \"extract_contract_fields\",\n    \"description\": \"Extract structured fields from the contract text.\",\n    \"input_schema\": {\n        \"type\": \"object\",\n        \"properties\": {\n            \"counterparty_name\": {\"type\": \"string\"},\n            \"contract_value_usd\": {\"type\": [\"integer\", \"null\"]},\n            \"start_date\": {\"type\": [\"string\", \"null\"]},\n            \"end_date\": {\"type\": [\"string\", \"null\"]},\n            \"auto_renewal\": {\"type\": \"boolean\"},\n            \"confidence\": {\"type\": \"string\", \"enum\": [\"high\", \"medium\", \"low\"]},\n        },\n        \"required\": [\"counterparty_name\", \"auto_renewal\", \"confidence\"],\n    },\n}\n\ndef extract_with_claude(raw_text: str) -> dict:\n    response = client.messages.create(\n        model=\"claude-opus-4-5\",\n        max_tokens=1024,\n        tools=[extraction_tool],\n        tool_choice={\"type\": \"tool\", \"name\": \"extract_contract_fields\"},\n        messages=[\n            {\"role\": \"user\", \"content\": f\"Extract fields from this contract:\\n\\n{raw_text}\"}\n        ],\n    )\n\n    tool_use_block = next(b for b in response.content if b.type == \"tool_use\")\n    return tool_use_block.input  # Already a dict, schema-validated\n```\n\nThe `tool_choice`\n\nparameter forces the model to always call the specified tool rather than choosing to respond in prose. Without it, the model might sometimes call the tool and sometimes answer in text — not useful in a production pipeline.\n\nA few things worth being clear about:\n\n**They do not fix bad prompts.** If your system prompt is vague about what a field should contain, you will get consistent structure but inconsistent semantics. `confidence: \"high\"`\n\nmeans whatever the model inferred it means, not whatever you intended. Schema design and prompt design go together.\n\n**They do not prevent hallucination.** The model can still make up a contract value or misattribute a date. You are getting reliably shaped data — its accuracy still depends on the model's reasoning and the quality of the source text. For high-stakes fields, add a verification step that cross-checks extracted values against source text.\n\n**They add latency.** Structured output generation with constrained decoding is slightly slower than unconstrained generation. For real-time user-facing features, measure this before committing to the pattern. For background processing pipelines, it generally does not matter.\n\nStructured outputs are not exotic — they are just the right default when you need typed data from an LLM. Free-text parsing is a trap that costs you maintenance time and production incidents over the long run.\n\nIf you are building an LLM integration that outputs data to a database, an API, or another system: define a Pydantic schema, use `response_format`\n\n, handle refusals, and route low-confidence results to human review. That is the pattern. It is not complicated once you have seen it, but it makes a meaningful difference in how reliably the system runs.\n\n[Lycore builds production AI systems](https://www.lycore.com/ai-development-services/) for businesses — document intelligence, agents, RAG pipelines, and custom LLM integrations on Django, React, Flutter, and .NET. [Get in touch](https://www.lycore.com/contact-us/) if you want to talk through your use case.", "url": "https://wpnews.pro/news/structured-outputs-how-we-stopped-parsing-llm-responses-by-hand", "canonical_source": "https://dev.to/lycore/structured-outputs-how-we-stopped-parsing-llm-responses-by-hand-3lgb", "published_at": "2026-06-27 10:19:18+00:00", "updated_at": "2026-06-27 10:33:52.060987+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "ai-products"], "entities": ["OpenAI", "Pydantic", "Django", "Celery", "GPT-4o"], "alternates": {"html": "https://wpnews.pro/news/structured-outputs-how-we-stopped-parsing-llm-responses-by-hand", "markdown": "https://wpnews.pro/news/structured-outputs-how-we-stopped-parsing-llm-responses-by-hand.md", "text": "https://wpnews.pro/news/structured-outputs-how-we-stopped-parsing-llm-responses-by-hand.txt", "jsonld": "https://wpnews.pro/news/structured-outputs-how-we-stopped-parsing-llm-responses-by-hand.jsonld"}}