cd /news/ai-agents/show-hn-we-cut-60-of-tokens-from-age… · home topics ai-agents article
[ARTICLE · art-32035] src=parcle.ai ↗ pub= topic=ai-agents verified=true sentiment=↑ positive

Show HN: We cut >60% of tokens from agentic tasks by removing repeated context

Parcle, a shared memory layer for AI agents, reduces token consumption by over 60% on agentic tasks by eliminating repeated context retrieval. The system indexes operational context and allows agents to retrieve only relevant memories, cutting token spend up to 70% and doubling task completion speed in early deployments.

read2 min views3 publishedJun 18, 2026

Every agentic system I see has the same hidden tax: the model keeps rereading the same context.

Tickets, Slack threads, docs, customer history, database notes, runbooks, logs, prior decisions. You can cache static prefixes, route to cheaper models, or set team budgets, but none of those fixes the underlying behavior: agents start most tasks trying to re-explore everything.

We built Parcle as a shared memory layer for AI agents. It ingests operational context, indexes what happened, and lets agents retrieve a small, relevant memory set for the next step instead of pasting everything back into the prompt - or worse, letting the agent go explore on it's own and burning tokens.

We started tracking our tokens consumed on tasks with and without our memory layer just with indexing of local files. In our deployments/evals, the biggest reduction we’ve seen is up to 70% lower token spend on agentic tasks, with roughly 2x faster task completion. The median was ~30% less tokens spent. The biggest savings often come from data and context-heavy workflows; when the agent needs to retrieve data and context from multiple locations and sources. The best cases so far are support, ops, research, sales, and finance workflows where the agent otherwise reloads the same account/workflow/history context again and again.

Why I think this matters now:

Pylon’s AI cost post made us ask the question:

How much are companies paying because their agents keep looking for the same context? Is this a hidden tax that memory could solve?

We built Parcle to make agents remember. The surprise was that memory does not just make agents more useful. It also cuts down on tokens consumed. Less tokens spent figuring where things are, and more time spent doing actually productive work.

  • Anthropic says agents use about 4x more tokens than chat. We think this is an understatement, - OpenAI and Anthropic both have prompt caching because repeated prompt context is expensive, but caching mostly helps when the reusable content is stable enough to hit the cache. But this doesn't resolve the fact that prompt caching is forfeited after 5min-15mins of inactivity. - “Lost in the Middle” and Chroma’s “context rot” work both point at the same issue: more context is not the same thing as usable memory. - The context-engineering crowd seems to be converging on this: the hard part is deciding what the model should see at each step.

Parcle is our attempt at making that operational: memory outside the model, selected into context only when useful.

I’d love feedback from people running real agents in production:

  1. Where are your tokens actually going: repeated input context, tool traces, retries, output, evals, or something else? 2. Have prompt caching and model routing been enough? 3. What would you need to trust an external memory layer inside an agent loop?
Comments URL: [https://news.ycombinator.com/item?id=48580512](https://news.ycombinator.com/item?id=48580512)

Points: 1

── more in #ai-agents 4 stories · sorted by recency
── more on @parcle 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-we-cut-60-of…] indexed:0 read:2min 2026-06-18 ·