I Built a Prompt Compressor That Saves 65% on LLM Costs — Here's the Story

Developer Arjun Shah built SuperCompress, an intelligent prompt compression system for LLMs that saves 65% on token costs while achieving 100% oracle recall, outperforming standard truncation. The system uses a tiny CPU model to score context lines for relevance before GPU processing, potentially saving 24K GPU hours and 1,526 tons of CO₂ daily at industry scale. SuperCompress is available on PyPI and GitHub.

I've been working on a side project called SuperCompress — an intelligent prompt compression system for LLMs. The idea is simple: most tokens you send to an LLM never need to be processed. They're padding, boilerplate, irrelevant context. But they still burn GPU cycles. I wanted to fix that. Working with LLM agents, I noticed something: every agent loop was sending massive context through the GPU. 10K tokens. 50K tokens. Sometimes more. Most of it was irrelevant to the specific task. Truncation keeping head + tail was the standard approach, but it regularly dropped critical information from the middle of the context. I thought: what if we could score each line of context for relevance BEFORE sending it to the GPU? A tiny CPU model that decides what matters. The technical challenge was: After a lot of iteration, the results surprised even me: | Policy | KV Saved | Oracle Recall | |---|---|---| | Truncation | 65% | 25% | | H2O | 65% | 98% | | SuperCompress | 65% | 100% | 100% oracle recall at the same token savings. The policy never dropped a line the answer depended on. Here's what hit me hardest: at 50M agent turns per day a conservative estimate for the industry , we're wasting 100B tokens daily. That's 24K GPU hours, 1,526 tons of CO₂, 6.5M liters of cooling water. Every day. Per 1 million compressions, SuperCompress saves: It's tiny per call. It's enormous at scale. Currently looking for: Live demo: https://supercompress.vercel.app https://supercompress.vercel.app GitHub: https://github.com/arjunkshah/supercompress https://github.com/arjunkshah/supercompress Docs: https://arjunkshah-supercompress-55.mintlify.app https://arjunkshah-supercompress-55.mintlify.app The ask: If you're building with LLMs, try compressing your next prompt. See if the answers stay the same. I'd love to hear what you think. Now available on PyPI pip install supercompress