Your extraction endpoint has run clean for weeks. Then a support ticket arrives in mixed German and English, the model decides to be helpful, and the response comes back as:
Here's the JSON you asked for:
{"name": "Müller GmbH", "role": "Lead"}
Let me know if you need anything else!
json.loads
throws. Your parser sees the friendly preamble, the friendly sign-off, and dies on the first character. The data was right there. The format around it was wrong.
This is the recurring fight with structured LLM output, and it splits into two separate decisions people tend to blur together: what shape the data takes, and what wraps it so you can find it. JSON and XML tags answer different halves of that question.
JSON is a data format. It describes a payload: keys, values, arrays, nesting. XML tags, the way most people use them in prompts, are a delimiter. They mark where a span starts and ends so you can slice it out of a stream of text.
When someone asks "JSON or XML for LLM output," they are usually conflating those jobs. The honest answer is that you often want both: an XML tag as the envelope, JSON as the letter inside it.
<result>
{"name": "Müller GmbH", "role": "Lead", "start": "2026-01"}
</result>
Now the model can ramble before <result>
and apologize after </result>
, and your extraction is a regex away. The JSON inside stays strict and typed. The tag absorbs the model's urge to talk.
import re, json
def extract(text: str) -> dict:
m = re.search(r"<result>(.*?)</result>", text, re.S)
if not m:
raise ValueError("no <result> block found")
return json.loads(m.group(1).strip())
That single pattern handles the preamble, the sign-off, and the markdown code fences models love to wrap JSON in. The tag gives you a landmark. JSON gives you the schema.
If you control the decoding, skip the wrapper. Most current APIs expose a structured-output or JSON mode that constrains generation to a schema you supply. The model physically cannot emit a stray sentence because the decoder rejects any token that breaks the grammar. When that mode is available, ask for raw JSON and validate it against the same schema you sent.
from pydantic import BaseModel
class Candidate(BaseModel):
name: str
role: str
start: str
parsed = Candidate.model_validate_json(raw_response)
Raw JSON is the safer bet when:
In those cases an XML wrapper adds a parsing step that buys you nothing. The decoder already guarantees the shape.
Tags earn their place the moment the response holds more than one kind of thing, or the model needs room to think before it answers.
A chain-of-thought task is the clean example. You want the reasoning and a clean payload, and you do not want the reasoning inside your JSON:
<scratchpad>
The street name contains a comma, so the naive split
would break the address into two fields. Keep it whole.
</scratchpad>
<result>
{"address": "Hauptstraße 4, Hinterhaus", "city": "Berlin"}
</result>
Parse <result>
, ignore <scratchpad>
. The model gets its thinking space, you get a payload that never had prose mixed in.
Tags are the safer bet when:
There is a quieter reason too. Models have seen enormous amounts of tag-delimited text in training, and tags are forgiving. A missing closing brace breaks JSON. A missing closing tag still leaves you a recoverable opening landmark. The wrapper degrades more gracefully than the payload.
For deep, repeating structure, JSON is the format that holds. Nested XML built by a language model gets unreliable fast: the model loses track of which tag it opened, closes them in the wrong order, or invents a tag name halfway down.
<order>
<items>
<item><sku>A1</sku><qty>2</qty></item>
<item><sku>B7</sku><qty>1</item> <!-- missing </qty> -->
</items>
</order>
That malformed block is a common failure mode for model-authored nested XML. The same data as JSON is flatter to generate and trivially validatable:
{
"order": {
"items": [
{"sku": "A1", "qty": 2},
{"sku": "B7", "qty": 1}
]
}
}
Rule of thumb: tags for the outer envelope, JSON for anything nested or repeating. One level of tags is a landmark. Five levels of tags is a parser bug waiting to happen.
Streaming is where the two formats behave least alike, and where most people get surprised.
Stream raw JSON token by token and every intermediate state is invalid. {"name": "Mül
is not parseable. You either buffer the whole response and parse once at the end (losing the point of streaming), or you reach for a tolerant incremental JSON parser that reads partial objects and emits keys as they complete. Those parsers exist and work, but they are extra dependency and extra care.
XML tags stream differently. You can watch the byte stream for <result>
and </result>
and know exactly when the payload is complete, without parsing anything mid-flight. A common production shape combines both: tags tell you the boundary, then you parse the JSON once the closing tag arrives.
async def read_result(stream):
buf = ""
async for chunk in stream:
buf += chunk
if "</result>" in buf:
m = re.search(r"<result>(.*?)</result>",
buf, re.S)
return json.loads(m.group(1).strip())
raise ValueError("stream ended without </result>")
If you want to render reasoning live and only commit the payload at the end, tags give you a clean seam. Open <scratchpad>
, stream its contents to the UI, and hold rendering of <result>
until you have parsed valid JSON. The user sees the model think; your application only ever acts on a complete, validated object.
Boil it down to three questions:
The combination that survives the widest range of model behavior is an XML envelope around a JSON payload, parsed by finding the tag first and decoding the JSON second. It tolerates preambles, sign-offs, code fences, and a model that wandered off before it found the schema. It costs you one regex.
The mistake is treating this as a single either/or. JSON describes data. Tags mark territory. Reach for the one that matches the job in front of you, and reach for both when the model needs room to talk and you need a payload that parses.
Output formatting is one of those decisions that feels trivial until a multilingual edge case or a streaming UI turns a clean parser into a 2 a.m. page. The Prompt Engineering Pocket Guide has a chapter on structured output that goes deeper on schema design, when constrained decoding is worth the latency, and how to keep the model inside the lines without burning tokens on instructions it ignores.