Show HN: We cut >60% of tokens from agentic tasks by removing repeated context

wpnews.pro

cd /news/ai-agents/show-hn-we-cut-60-of-tokens-from-age… · home › topics › ai-agents › article

[ARTICLE · art-32035] src=parcle.ai ↗ pub=2026-06-18T03:39Z topic=ai-agents verified=true sentiment=↑ positive

Show HN: We cut >60% of tokens from agentic tasks by removing repeated context

Parcle, a shared memory layer for AI agents, reduces token consumption by over 60% on agentic tasks by eliminating repeated context retrieval. The system indexes operational context and allows agents to retrieve only relevant memories, cutting token spend up to 70% and doubling task completion speed in early deployments.

read2 min views3 publishedJun 18, 2026

Every agentic system I see has the same hidden tax: the model keeps rereading the same context.

Tickets, Slack threads, docs, customer history, database notes, runbooks, logs, prior decisions. You can cache static prefixes, route to cheaper models, or set team budgets, but none of those fixes the underlying behavior: agents start most tasks trying to re-explore everything.

We built Parcle as a shared memory layer for AI agents. It ingests operational context, indexes what happened, and lets agents retrieve a small, relevant memory set for the next step instead of pasting everything back into the prompt - or worse, letting the agent go explore on it's own and burning tokens.

We started tracking our tokens consumed on tasks with and without our memory layer just with indexing of local files. In our deployments/evals, the biggest reduction we’ve seen is up to 70% lower token spend on agentic tasks, with roughly 2x faster task completion. The median was ~30% less tokens spent. The biggest savings often come from data and context-heavy workflows; when the agent needs to retrieve data and context from multiple locations and sources. The best cases so far are support, ops, research, sales, and finance workflows where the agent otherwise reloads the same account/workflow/history context again and again.

Why I think this matters now:

Pylon’s AI cost post made us ask the question:

How much are companies paying because their agents keep looking for the same context? Is this a hidden tax that memory could solve?

We built Parcle to make agents remember. The surprise was that memory does not just make agents more useful. It also cuts down on tokens consumed. Less tokens spent figuring where things are, and more time spent doing actually productive work.

Anthropic says agents use about 4x more tokens than chat. We think this is an understatement, - OpenAI and Anthropic both have prompt caching because repeated prompt context is expensive, but caching mostly helps when the reusable content is stable enough to hit the cache. But this doesn't resolve the fact that prompt caching is forfeited after 5min-15mins of inactivity. - “Lost in the Middle” and Chroma’s “context rot” work both point at the same issue: more context is not the same thing as usable memory. - The context-engineering crowd seems to be converging on this: the hard part is deciding what the model should see at each step.

Parcle is our attempt at making that operational: memory outside the model, selected into context only when useful.

I’d love feedback from people running real agents in production:

Where are your tokens actually going: repeated input context, tool traces, retries, output, evals, or something else? 2. Have prompt caching and model routing been enough? 3. What would you need to trust an external memory layer inside an agent loop?

Comments URL: [https://news.ycombinator.com/item?id=48580512](https://news.ycombinator.com/item?id=48580512)

Points: 1

source & further reading

parcle.ai — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/show-hn-we-cut-60-of-tok…

Read original on parcle.ai → parcle.ai/

mentioned entities

Parcle

Anthropic

OpenAI

Chroma

Pylon

metadata

slugshow-hn-we-cut-60-of-tokens-from-agentic-tasks-by-removing-repeated-context

topic#ai-agents

secondary4 topics

sentimentpositive

canonicalparcle.ai

navigation

← prevLLM Prompt Injection & Guardrail…

next →I built an AI Agent Dashboard — …

── more in #ai-agents 4 stories · sorted by recency

github.com · 18 Jun · #ai-agents

Sigil – tamper-evident audit and signed scopes for LLM prompts

letsdatascience.com · 18 Jun · #ai-agents

Meta executive exits amid internal AI-for-work overhaul

letsdatascience.com · 18 Jun · #ai-agents

Microsoft sells OpenAI models to Chinese firms

dev.to · 18 Jun · #ai-agents

How to Actually Set Up Claude Projects That Most Users Don't Know

── more on @parcle 3 stories trending now

wpnews · 17 Jun · #developer-tools

CircleCI MCP Server: Debug Build Failures Without Leaving Your AI Coding Agent

wpnews · 16 Jun · #autonomous-vehicles

Micropolis Signs Five-Year Autonomous Sweeper Deployment Deal

wpnews · 17 Jun · #artificial-intelligence

How I Build Production AI Apps on Cloudflare with Claude Code

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required