cd /news/large-language-models/show-hn-llm-memory-without-context-b… Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-22332] src=tenureai.dev pub= topic=large-language-models verified=true sentiment=↑ positive

Show HN: LLM memory without context bleed; 100% precision vs. <10% vector search

Tenure has released a new memory system for large language models that claims 100% precision in retrieving relevant context, compared to under 10% for vector search methods. The tool, which operates as a local LLM provider within VS Code, automatically injects structured beliefs into every request without requiring the model to call a separate memory function. The company argues that existing memory tools waste tokens and increase latency by dumping entire chat histories into context windows, whereas Tenure extracts and injects only relevant information.

read7 min publishedJun 5, 2026

Not RAG. Not another memory wrapper. Watch what it learns before it ever changes a response.

Persistent memory that follows you across every tool and every session. Turn injection on when you're ready, not before.

// install from your existing tools

Prefer the terminal?

curl -fsSL
https://raw.githubusercontent.com/tenurehq/tenure/main/scripts/install.sh
| bash

1.0 precision. Because a smart model shouldn't have to clean up messy retrieval. #

Other memory tools cheat. They dump your entire chat history or loose vector clusters into the context window and let a capable downstream model find the needle in the haystack. They claim SOTA on answer accuracy but you pay the latency and context tax.

It's the difference between querying your database and filtering in application code. Everyone knows which one belongs in production.

Every irrelevant belief in context is tokens you're paying for and latency you're waiting on.

Results are reproducible. Dataset on HuggingFace. Run it yourself

You can see exactly what it's costing you. #

Most tools hide the waste. Tenure measures it and shows you it decreasing.

Numbers shown are illustrative from a representative session. Your dashboard shows your actual numbers from day one.

Try it before you commit to it. #

Run with extraction on and injection off for a week or two. See exactly what Tenure learned about how you work before it ever changes a single response. No risk. No behavior change. No surprises.

Extraction on. Injection off.

Tenure watches your sessions silently. It extracts decisions, preferences, facts, questions, and blockers into a structured belief store but injects nothing. Your AI responses are completely unchanged.

Open the panel. See what it knows.

Before you ever turn injection on, open the VS Code side panel and read every belief Tenure captured. Edit anything. Delete anything. Pin what matters most. You decide what the model gets to know.

Memory tools that rely on MCP have a fundamental flaw. #

The model has to decide to call them. That means your context is only retrieved when the model thinks to reach for it and it often does not. You end up with memory that works sometimes, unpredictably, on certain questions.

Retrieval on request. Sometimes.

When memory is a tool call, the model decides when to fetch context. If it does not recognize that it needs prior information, it does not ask. Your memory sits idle while the model hallucinates decisions you already made.

Memory injected automatically. Always.

Tenure sits between your client and your provider. Every request is enriched before it reaches the model. No tool call to trigger. No prompt engineering required. Your beliefs are in context on every single request.

Capability Tenure MCP Memory
Memory in context on every request Always Only if model calls the tool
Works across every client Yes Client must support MCP
Zero prompt engineering Yes Requires tool-use config
Retrieval latency <15ms Varies, often 500ms+
Fully local, no cloud Yes Depends on implementation
Per-turn injection audit Yes β€” every turn logged No
Within-session availability (turn N+1) Yes β€” extracted during stream No

Registered as your LLM provider. #

You don't change a thing.

Tenure registers directly inside VS Code as a native LLM provider. Bring your own API key. Your existing tools β€” Copilot, Continue, any OpenAI-compatible client just works, with memory.

Registered at the provider level, not a plugin wrapping someone else's call. Tenure is in the chain. Your key, your model, your memory. Onboarding happens inside the IDE.

BYOKβ€” Anthropic, OpenAI, or any OpenAI-compatible endpoint

Native copilot features with memory β€” no workflow change

Beliefs panel in the sidebar β€” inspect, edit, pin from inside VS Code

Onboarding inside the IDEβ€” no browser tab, no separate dashboard

Microsoft vs. Cursor is heating up. You are already in VS Code. Now it remembers everything.

You're on a walk. The auth flow clicks. You tell OpenClaw on WhatsApp: "Redis for sessions, not Postgres. TTL-based expiry." Tomorrow morning, your IDE already knows. No copy-paste. No re-explanation. The decision landed the moment you said it.

No priming. No copy-pasting context. Exactly where you left off, in a completely different client.

Your beliefs. Right in the sidebar. #

The Tenure side panel shows you exactly what the model knows, per file and globally, directly inside VS Code. Click any belief to see every turn it was injected and the exact query that surfaced it.

● Click any belief to see every turn it was injected. ● The exact query that surfaced it is persisted alongside it. ● Edit, pin, or delete directly from the panel. ● Per-file and global beliefs stay scoped β€” project A never bleeds into project B.

Something went wrong. #

Here's how you find out why.

Every other system shows you a list of beliefs. Tenure shows you a history. Click a turn, see exactly what was injected. Click a belief, see every turn it was active. The query that triggered it is persisted at the time it happened, not reconstructed from logs.

Per-turn injection log

See exactly which beliefs were in context for every turn. Not inferred. Recorded as it happened.

Belief provenance

Click any belief to see every session it was injected, and the query that surfaced it each time.

Persisted query context

The exact query you gave the model is stored alongside the injection event. The record is complete.

No other system tracks injection per turn. Every competitor shows you beliefs as a static list. Tenure gives you a complete paper trail β€” which beliefs were present, when, against which query. For developers who care about why the model responded the way it did, that difference is everything.

Control when you want it. #

Invisible when you do not.

Tenure learns automatically. But everything it knows is visible, editable, and correctable.

Structured beliefs

Not raw chat history. Typed, scoped, versioned conclusions the model can act on directly without re-deriving them from a transcript.

Scope isolation

Engineering beliefs stay in engineering sessions. Project A never bleeds into Project B. Every scope is a hard structural boundary not a probabilistic filter.

Supersession

When you change a decision, the old one is retired β€” not deleted. It never gets suggested again, but the record stays for audit.

anytime

Type !extract off

in any session. Your existing memory still works. Tenure just stops taking notes.

Editable memory

Open /beliefs

or use the VS Code side panel to see everything Tenure knows. Edit anything. Pin what matters. Delete what is wrong.

Sub-15ms retrieval

Alias-weighted BM25 with hard scope filtering. No embedding model, no reranking pass, no waiting. Precision improves as the store grows β€” not degrades.

Sits between you and your provider. #

Point your client at localhost:5757. Tenure routes to your provider, learns from the conversation, and injects what it knows on every request.

One click. That's the only manual step.

Install directly from VS Code, Windsurf, or Continue. Or run the install script. Runs as a local Docker container. Your bearer token is printed at the end. Takes thirty seconds.

The daemon starts itself.

The extension securely spins up the local daemon on port 5757 and hooks up the proxy layer. The daemon starts itself. No terminal required.

Your key goes into the IDE's secret store. Not a config file.

Tenure generates your local bearer token and automatically injects it directly into your IDE's native secret store. Zero plain-text keys left floating around in your global config files.

There is no step four.

Tenure learns in the background. The first session is good. The tenth is noticeably better. By the fiftieth, it knows you. Every session compounds. The store gets more precise, not less, as it grows.

Your data never leaves your machine. #

No cloud. No accounts. No telemetry. Tenure runs entirely on localhost. Beliefs are encrypted at rest. Export your entire memory as an encrypted archive and restore it anywhere. You own your memory. All of it. Forever.

| Network calls home | Zero | | Account required | No | | Storage | Encrypted at rest | | Portable | Export / import anytime | | License | MIT |

Stop re-explaining yourself. #

Thirty seconds to install. First session already better. Turn injection on when you are ready, not before.

// install from your existing tools

Prefer the terminal?

curl -fsSL
https://raw.githubusercontent.com/tenurehq/tenure/main/scripts/install.sh
| bash
── more in #large-language-models 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/show-hn-llm-memory-w…] indexed:0 read:7min 2026-06-05 Β· β€”