{"slug": "cut-llm-prompt-tokens-on-structured-data-losslessly", "title": "Cut LLM prompt tokens on structured data — losslessly", "summary": "A developer released ctxfold, a dependency-free tool that losslessly compresses structured data like logs, JSON, and CSV in LLM prompts by re-encoding repetitive structure into a compact table format. It achieves ~35-40% token reduction on templated data without dropping any information, verified by built-in encode-decode checks. The tool is available on npm and works with any LLM as a pure text transform.", "body_md": "*A small, dependency-free tool for shrinking logs, JSON, and CSV in prompts — without dropping a single byte.*\n\nLogs, JSON, and CSV are some of the bulkiest, most repetitive things we feed into LLMs. They're also where prompt-token costs quietly pile up.\n\nThe usual fix is **semantic compression**: have a model summarize the input and drop \"low-information\" tokens. It works — until the question needs the data that got dropped.\n\nAsk:\n\n\"How many errors are in this log?\"\n\n\"What's the total across these 400 rows?\"\n\n…and a lossy compressor can hand back a **confident, wrong answer** — because the rows it discarded were exactly the ones you needed. The compression looks great. The answer is broken.\n\n**ctxfold** takes the opposite approach. Its single rule:\n\nLossless or no-op. Never lossy.\n\nInstead of summarizing, it re-encodes *structure*. Logs, JSON arrays, and CSV are tables in disguise — the same keys, prefixes, and templates repeat on every line. ctxfold lifts those repeated parts into a one-time header and keeps only what varies per row, producing a compact, self-labeling table the model reads directly. **Nothing is dropped.**\n\nThe guarantee is enforced in code: every encoder ships with a decoder, and `compress()`\n\nverifies that decoding its output reproduces the input *before* returning it. If it can't, you get your original text back, untouched. It **can't corrupt your data** — worst case, it does nothing.\n\nYes. On real data, ctxfold cuts **~35–40% of tokens** on templated logs and JSON arrays, fully losslessly. And because the output is plain, labeled text, the model reads it as well as the raw input — in lookup tests against GPT-4o-mini, answers off the compressed form matched answers off the raw data, field for field.\n\n*(Readability is validated on GPT-4o-mini; the lossless guarantee is model-independent.)*\n\n```\nnpm install ctxfold\njs\nconst { compress } = require(\"ctxfold\");\n\nconst { text, stats } = compress(bigLogOrJsonOrCsv);\n// send `text` instead of the original\nconsole.log(`${(stats.tokenRatio * 100).toFixed(0)}% fewer tokens, lossless: ${stats.lossless}`);\n```\n\nIt's a pure text transform — no API calls, no model, zero dependencies — so it works with any LLM.\n\nctxfold isn't a competitor to semantic compression; it's the complement. **Summarize to extract a subset; ctxfold to shrink repetition without losing anything.** It shines on structured data, not prose.\n\nThis started from a simple frustration: lossy prompt compressors gave impressive token savings, but on aggregate questions — counts, totals, \"find this record\" — the answers came back wrong, because the data needed to answer had been summarized away. Great compression, broken results. The fix wasn't a smarter summarizer; it was to stop dropping data at all. Repetitive structured text is compressible *losslessly* — you just have to treat it as structure instead of prose.\n\nIf you push a lot of logs, JSON, or CSV into prompts, I'd genuinely like to know what your payloads look like and whether the lossless tradeoff fits your use case. **What's eating the most tokens in your prompts right now?** Questions, critique, and edge cases that break it are all welcome in the comments.\n\n**Repo & docs:** [https://github.com/antrixy/ctxfold](https://github.com/antrixy/ctxfold) · **npm:** `npm install ctxfold`\n\n· MIT licensed.", "url": "https://wpnews.pro/news/cut-llm-prompt-tokens-on-structured-data-losslessly", "canonical_source": "https://dev.to/maverick_y_4e3300c63f2285/cut-llm-prompt-tokens-on-structured-data-losslessly-op5", "published_at": "2026-06-27 12:01:37+00:00", "updated_at": "2026-06-27 12:03:32.177268+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools"], "entities": ["ctxfold", "GPT-4o-mini", "npm", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/cut-llm-prompt-tokens-on-structured-data-losslessly", "markdown": "https://wpnews.pro/news/cut-llm-prompt-tokens-on-structured-data-losslessly.md", "text": "https://wpnews.pro/news/cut-llm-prompt-tokens-on-structured-data-losslessly.txt", "jsonld": "https://wpnews.pro/news/cut-llm-prompt-tokens-on-structured-data-losslessly.jsonld"}}