# SuperCompress: Cut LLM Costs by 65% Without Losing Answers

> Source: <https://dev.to/arjunkshah/supercompress-cut-llm-costs-by-65-without-losing-answers-2c8n>
> Published: 2026-06-26 19:23:33+00:00

Every LLM call burns GPU cycles on tokens that never needed to run.

Padding. Boilerplate. Irrelevant context.

I built SuperCompress — a tiny CPU policy that cuts 65% of tokens before inference.

Open source. MIT. Free tier.

supercompress.vercel.app

The problem is worse than most people realize.

At ~50M agent turns/day:

→ 100B tokens wasted daily

→ 24K GPU hours

→ 1,526 tons CO₂

→ 6.5M L cooling water

We're burning through resources on tokens that don't matter.

How it works:

1️⃣ Context + question → CPU policy (5K params)

2️⃣ Every line scored for relevance to the question

3️⃣ Low-scoring lines evicted

4️⃣ Only essential tokens reach the GPU

CPU first. GPU for what matters.

The numbers at 35% budget:

• 65% KV cache saved

• 100% oracle recall (vs 25% for truncation)

• ~60ms CPU latency

Same answers. ⅓ the compute.

Per 1 million compressions:

→ 800M tokens avoided

→ 29 kWh saved

→ 12 kg CO₂ avoided

→ 52 L cooling water saved

Scale that across the industry and it's enormous.

SuperCompress is:

✅ Open source (MIT)

✅ Free API tier

✅ Python library

✅ Browser demo (no install)

✅ Integration guides for OpenAI/LangChain

Try it: supercompress.vercel.app

GitHub: github.com/arjunkshah/supercompress

Built this because I believe we can't scale AI by burning through what we have left.

Smarter compute means more AI for everyone — without the environmental cost.

Would love feedback from the community 🙏

**Links:** [GitHub](https://github.com/arjunkshah/supercompress) | [Live Demo](https://supercompress.vercel.app) | [Interactive Tool](https://supercompress.vercel.app/compare)
