cd /news/artificial-intelligence/the-data-refinery-how-json-quietly-b… Β· home β€Ί topics β€Ί artificial-intelligence β€Ί article
[ARTICLE Β· art-30357] src=dev.to β†— pub= topic=artificial-intelligence verified=true sentiment=Β· neutral

The Data Refinery: How JSON Quietly Became the Language AI Agents Speak

A developer warns that JSON serialization bugs are a growing pain point in AI agent pipelines, where LLM tool calls and structured outputs rely on JSON. The post explains how JavaScript's JSON.stringify silently drops functions, undefined, and symbols, and how cyclic references cause errors, urging engineers to understand serialization deeply.

read12 min views2 publishedJun 17, 2026

Every tool call, every structured output, every agent decision travels as JSON. Here is the serialization knowledge that separates the amateur from the architect β€” now that the stakes have never been higher.

A developer ships an AI agent on a Friday. In the demo it's flawless: the model reads a request, calls a tool, returns a clean answer the app renders perfectly.

A week later, production dashboards are full of garbage. A date is showing up as raw text. A field that was definitely there is silently gone. Under one big payload, the whole server froze for two seconds. And here's the maddening part β€” nothing threw an error. The model returned JSON. The code parsed it. Everything "worked."

The bug wasn't in the model, and it wasn't in the parser. It lived in the narrow gap between text and data β€” the place every JSON value has to cross twice. That gap is serialization, and in 2026 it has quietly become one of the most important things a JavaScript engineer can actually understand.

Why now? Because the most important conversations in modern software aren't between humans anymore. They're between models and machines β€” an LLM deciding which tool to call, a server answering, an agent chaining ten steps together. And every one of those conversations happens in the same format: JSON.

So let's open up the refinery and see how raw structure becomes a clean stream of bytes β€” and back again β€” without losing anything precious on the way.

This is the misunderstanding that creates most JSON bugs, so it's worth saying plainly: JSON only looks like a JavaScript object. It isn't one.

JSON is a transport format β€” flat, inert text meant to travel across a network or sit on a disk. A JavaScript object is a live structure in memory that your application can read, mutate, and call methods on. They resemble each other the way a flat-packed cardboard box resembles assembled furniture: same thing in spirit, completely different states.

const user = { name: "Joao" };   // a live object in memory
typeof user;                     // "object"

const text = JSON.stringify(user); // '{"name":"Joao"}' β€” just characters
typeof text;                       // "string"

The V8 engine has to do active work to move between these two worlds. Until you parse it, {"name":"Joao"}

is no more "an object" than the word cake is something you can eat. Hold on to that mental model β€” everything below is just the two machines that cross the gap: one that packs, one that unpacks.

JSON.stringify

and the serialization funnel JSON.stringify

walks the enumerable properties of a value and compresses them into a single JSON string for travel over the network or to disk. But it is not a neutral photocopier. Think of it as a funnel with three filters, and knowing what each filter does is what saves you at 2am.

Filter 1 β€” types that pass through cleanly: strings, numbers, booleans, arrays, and plain objects survive untouched.

Filter 2 β€” types that get quietly transformed: a Date

is converted to an ISO 8601 string; NaN

and Infinity

are turned into null

.

Filter 3 β€” types that are dropped entirely: functions, undefined

, and symbols simply vanish from the output.

const data = {
  name: "Ana",
  createdAt: new Date(), // becomes an ISO string
  balance: Infinity,     // becomes null
  greet: () => "hi",     // dropped (function)
  nickname: undefined    // dropped (undefined)
};

JSON.stringify(data);
// '{"name":"Ana","createdAt":"2026-06-16T...Z","balance":null}'

Read that output again. Three of the five fields changed or disappeared, and the engine didn't say a word. That silence is the whole danger.

A JSON structure can nest as deeply as you like, but it must be strictly acyclic. The engine tracks the stack of objects it's walking; the moment it meets the same object twice, it aborts hard.

const a = {};
a.self = a;            // a points back at itself
JSON.stringify(a);
// TypeError: Converting circular structure to JSON

This is one of the rare cases where JSON fails loudly instead of silently β€” and you should be grateful for it.

replacer

The second argument to JSON.stringify

is a replacer

β€” a surgical interception that runs during packing. It lets you mutate values or strip sensitive data before it ever reaches the wire. The classic use is redacting secrets:

const user = { name: "Joao", password: "123", admin: true };

JSON.stringify(user, (key, value) =>
  key === "password" ? undefined : value
);
// '{"name":"Joao","admin":true}'

Return undefined

from the replacer and the key is deleted from the payload. It's the cleanest place to make sure a password never leaves the building.

space

and toJSON

Two more levers are worth knowing. The third argument, space

, injects whitespace β€” trading network efficiency for human readability when you're debugging. And any object can define a toJSON()

method to dictate its own serialization; the engine always delegates to it when present.

const account = {
  id: 42,
  secret: "s3cr3t",
  toJSON() { return { id: this.id }; } // dictate your own shape
};

JSON.stringify(account); // '{"id":42}' β€” secret never serialized

JSON.parse

and rehydration On the way back, JSON.parse

reconstructs ECMAScript values from the text, rebuilding the hierarchy strictly from the syntax in the string. But remember Filter 2: serialization erased types. That Date

you sent is now just a string, and parsing alone won't bring it back to life.

That's what the reviver

β€” the second argument to parse

β€” is for. It intercepts parsing node by node, letting you rehydrate flat strings back into rich instances.

const text = '{"event":"deploy","when":"2026-06-16T10:30:00Z"}';

const obj = JSON.parse(text, (key, value) =>
  key === "when" ? new Date(value) : value
);

obj.when instanceof Date; // true β€” revived

Serialization is lossy by design; the reviver is how you choose what to restore on the other side.

replacer

vs. reviver

These two hooks are mirror images, and confusing them is a common source of bugs. Here's the clean comparison:

replacer | reviver | | |---|---|---| Runs during | Serialization (stringify ) | After parsing (parse ) | Receives | The original in-memory value | The freshly parsed string/literal | Main use | Omit secrets, filter payloads | Restore classes (e.g. Date ) | Delete a value by | Returning undefined | Returning undefined |

Here's a trick almost every JavaScript developer has reached for: deep-cloning an object with JSON.parse(JSON.stringify(obj))

. It's clever, it's one line β€” and it's a silent killer, because it runs your data through the entire funnel above.

const original = {
  date: new Date(),
  tags: new Set(["a", "b"]),
  meta: { level: 42 }
};

// The "classic" hack β€” loses the Date, destroys the Set
const bad = JSON.parse(JSON.stringify(original));
bad.date;  // "2026-..." (a string!)
bad.tags;  // {} (empty object!)

Dates become strings, undefined

disappears, Map

and Set

collapse into empty objects, functions are gone, and a circular reference throws. The fix has been native since 2022: ** structuredClone()**, built on the same Structured Clone Algorithm the platform already uses internally for

postMessage

and IndexedDB.

const good = structuredClone(original);
good.date; // a real Date
good.tags; // Set(2) { "a", "b" }

structuredClone

preserves circular references, Map

, Set

, typed arrays, and Date

; it keeps undefined

; it's roughly 20–30% slower but trades that for data integrity; and it adds zero bytes to your bundle (goodbye, Lodash's cloneDeep

). It throws on functions and DOM nodes β€” which, honestly, is a feature. If you're cloning a function, your data model is trying to tell you something.

Step back from the two functions and you'll notice something: JSON isn't just data flowing through your app. In the Node ecosystem, it's the declarative blueprint the whole architecture is built on.

Open any package.json

and you're reading a JSON object that controls everything: main

is the entry point, scripts

are your automation triggers (start

, test

, build

), dependencies

define the module tree npm assembles, and private: true

is a safety lock against accidental publishing. Configuration follows the same instinct β€” critical values like passwords and URLs don't live in source code; the common pattern is to unify process.env

into centralized config objects that switch behavior between development and production.

And this is where a genuinely modern upgrade lands. For years, importing a JSON config meant a bundler or a fetch()

. As of ES2025 (baseline across modern runtimes since April 2025), you can import JSON natively with an import attribute:

// Native JSON import β€” no bundler, no fetch
import config from "./config.json" with { type: "json" };

console.log(config.apiUrl);

That with { type: "json" }

is not decoration β€” it's a security contract. It forces the runtime to verify the file is genuinely JSON (via its MIME type) before processing it, which prevents a server from sneaking executable JavaScript in through a file that merely looks like data. JSON modules can't run code; they're pure data, and only ever expose a default export. The platform turned a workaround into a guarantee.

Now the hard part. Real-time applications don't receive tidy, complete JSON documents β€” they receive data flowing in streams over HTTP, arriving in fragments. Call the native JSON.parse

naΓ―vely on a half-arrived network buffer and you get one of two bad outcomes: a syntax error on incomplete data, or β€” worse β€” a blocked single-threaded event loop while a huge payload is parsed synchronously, freezing the entire server for every other user.

The architecture demands a specialized intermediary. In Express, that's the express.json()

middleware β€” the inspection conveyor on the assembly line. It buffers the incoming stream safely, checks the Content-Type: application/json

header, parses the result, and hands your route a ready-to-use req.body

.

const express = require("express");
const app = express();

app.use(express.json()); // the inspection conveyor

app.post("/api/users", (req, res) => {
  // req.body is already an object: stream buffered, validated, parsed
  console.log(req.body.name);
  res.status(201).json({ ok: true });
});

The distinction between the native function and the middleware is the distinction between a script and a system:

JSON.parse() | express.json() | | |---|---|---| Execution context | Synchronous memory (data already in V8) | HTTP network layer (buffers/streams) | Invalid data | Throws SyntaxError , aborts execution | Returns a clean HTTP 400, keeps running | Scalability | Low β€” blocks the event loop on huge payloads | High β€” manages payload limits and concurrency |

Everything above used to be "good Node hygiene." In 2026 it's something bigger, because of one structural fact: LLMs are text generators, and your systems need data structures. JSON is the bridge between them β€” and, as we've seen, the bridge is exactly where bugs live.

That gap is now formalized into three levels of reliability, and knowing which one you're on is the difference between a demo and production:

This isn't fringe tooling. Native structured output now ships across OpenAI (since August 2024), Google Gemini (2024, expanded through 2026), Anthropic (beta in late 2025, GA in early 2026), Cohere, and xAI's Grok β€” plus local runtimes like Ollama, vLLM, and SGLang. The schema has become the contract between the model and the rest of your system, and the advice from teams running this in production is blunt: design the schema first, the same way you'd design a database schema before writing application code. Tools like Pydantic and Zod exist to make that contract executable, and the real prize is testability β€” once output is typed and schema-valid, you can write unit tests and regression suites against it and catch the day a model update quietly changes its behavior.

Go one layer deeper, to the wire itself, and JSON is there too. The Model Context Protocol β€” introduced by Anthropic in November 2024 and now supported across Claude, Cursor, Gemini, and the major clouds β€” runs on JSON-RPC 2.0. Every tool an agent invokes, every resource it reads, is a JSON-RPC message:

{
  "jsonrpc": "2.0",
  "id": 7,
  "method": "tools/call",
  "params": {
    "name": "get_order",
    "arguments": { "orderId": "A-1042" }
  }
}

JSON Schema tells the model what arguments a tool accepts before it calls; one-way notifications carry progress updates; and batching lets an agent fan out several tool calls at once. MCP exists to solve the NΓ—M problem β€” connecting N models to M tools without writing NΓ—M custom adapters β€” and it solves it by making JSON the universal language every agent and every tool already speaks.

Now connect the two halves of this article. Every serialization gotcha we covered β€” the silently dropped field, the Date

flattened into a string, the circular reference, the event loop frozen by a fat payload β€” now happens inside agent pipelines, where a non-deterministic model's output becomes your system's input. The silent bug was always dangerous. With a model on one end of the pipe, it's more dangerous than ever. Understanding the refinery stopped being optional the moment your software started talking to itself.

From rigorous lexical validation in the ECMAScript spec, to stream orchestration at scale in Node, to the contract language of autonomous agents β€” JSON has quietly become the connective tissue of the entire stack. It is one of the simplest formats ever designed, and that simplicity is exactly why it won.

Mastering the transformation agents β€” replacer

, reviver

, structuredClone

, the schema β€” and the network traffic that carries them is what separates the programmer who uses JSON from the architect who commands it. A technical article, after all, isn't made of words alone; it's made of the small, exact decisions that survive contact with production.

So the next time an agent calls a tool and an answer comes back clean, you'll know what really happened in that fraction of a second. The Data Refinery is operational β€” and now you know how to run it.

Follow me on Dev.to for practical content about software engineering, AI, architecture, frontend, and backend development.

For complete articles, developer cheat sheets, and access to CIEL, my AI-powered learning guide, visit: blense.fun/en

No hype. Just clear and practical tech content. πŸš€

Written in June 2026. The platform features referenced β€” import attributes ( with { type: "json" }), structuredClone, native LLM structured output via constrained decoding, and the Model Context Protocol over JSON-RPC 2.0 β€” reflect the state of JavaScript and the AI tooling ecosystem at that date.

── more in #artificial-intelligence 4 stories Β· sorted by recency
── more on @json 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/the-data-refinery-ho…] indexed:0 read:12min 2026-06-17 Β· β€”