I Built a Prompt Compressor That Saves 65% on LLM Costs — Here's the Story

wpnews.pro

cd /news/large-language-models/i-built-a-prompt-compressor-that-sav… · home › topics › large-language-models › article

[ARTICLE · art-41234] src=dev.to ↗ pub=2026-06-26T19:45Z topic=large-language-models verified=true sentiment=↑ positive

I Built a Prompt Compressor That Saves 65% on LLM Costs — Here's the Story

Developer Arjun Shah built SuperCompress, an intelligent prompt compression system for LLMs that saves 65% on token costs while achieving 100% oracle recall, outperforming standard truncation. The system uses a tiny CPU model to score context lines for relevance before GPU processing, potentially saving 24K GPU hours and 1,526 tons of CO₂ daily at industry scale. SuperCompress is available on PyPI and GitHub.

read1 min views1 publishedJun 26, 2026

I've been working on a side project called SuperCompress — an intelligent prompt compression system for LLMs. The idea is simple: most tokens you send to an LLM never need to be processed. They're padding, boilerplate, irrelevant context. But they still burn GPU cycles.

I wanted to fix that.

Working with LLM agents, I noticed something: every agent loop was sending massive context through the GPU. 10K tokens. 50K tokens. Sometimes more. Most of it was irrelevant to the specific task.

Truncation (keeping head + tail) was the standard approach, but it regularly dropped critical information from the middle of the context.

I thought: what if we could score each line of context for relevance BEFORE sending it to the GPU? A tiny CPU model that decides what matters.

The technical challenge was:

After a lot of iteration, the results surprised even me:

Policy	KV Saved	Oracle Recall
Truncation	65%	25%
H2O	65%	98%
SuperCompress	65%	100%

100% oracle recall at the same token savings. The policy never dropped a line the answer depended on.

Here's what hit me hardest: at 50M agent turns per day (a conservative estimate for the industry), we're wasting 100B tokens daily. That's 24K GPU hours, 1,526 tons of CO₂, 6.5M liters of cooling water. Every day.

Per 1 million compressions, SuperCompress saves:

It's tiny per call. It's enormous at scale.

Currently looking for:

Live demo: [https://supercompress.vercel.app](https://supercompress.vercel.app)

GitHub: [https://github.com/arjunkshah/supercompress](https://github.com/arjunkshah/supercompress)

Docs: [https://arjunkshah-supercompress-55.mintlify.app](https://arjunkshah-supercompress-55.mintlify.app)

The ask: If you're building with LLMs, try compressing your next prompt. See if the answers stay the same. I'd love to hear what you think.

Now available on PyPI! pip install supercompress

source & further reading

dev.to — original article SuperCompress is now on PyPI! pip install supercompress in 1 line How a .NET dev built an AI assistant Cutting our LLM bill ~80% with model routing: the actual cost math

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-built-a-prompt-compres…

Read original on dev.to → dev.to/arjunkshah/i-built-a-prompt-compressor-th…

mentioned entities

Arjun Shah

SuperCompress

PyPI

GitHub

metadata

slugi-built-a-prompt-compressor-that-saves-65-on-llm-costs-here-s-the-story

topic#large-language-models

secondary2 topics

sentimentpositive

canonicaldev.to

navigation

← prevFord Scrambled to Rehire Enginee…

next →SuperCompress is now on PyPI! pi…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 26 Jun · #large-language-models

SuperCompress is now on PyPI! pip install supercompress in 1 line

dev.to · 26 Jun · #large-language-models

SuperCompress: Cut LLM Costs by 65% Without Losing Answers

afcommerce.com · 26 Jun · #large-language-models

Show HN: A free ACP payments module that adds Stripe payments to MCP tools

dev.to · 26 Jun · #large-language-models

Cutting our LLM bill ~80% with model routing: the actual cost math

── more on @arjun shah 3 stories trending now

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required