The self-improving prompt engine that learns from your codebase history

wpnews.pro

Via v0.4.0: We Built a CLI That Gets Smarter Every Time You Use It

We shipped Via v0.4.0 today another weekend project based on utilizing prompt development in a different method. The headline feature is something we have not seen as a methodology in the AI tooling space currently that we are aware of.

Every prompt you run teaches the next one. Every correction gets stored. Every success becomes a reusable pattern. After a month of daily use, the prompts Via generates know more about what works in your codebase than you consciously remember.

Here is how we built it, why it works, and what it took to get there.

The Problem With Every Other Prompt Tool

The AI coding tool space has a prompt problem. Not a model problem. Not a context window problem. A prompt problem.

73% of engineering teams now use AI coding tools daily. The developers pulling ahead are not using better models. They are using better prompts. Specific, structured, historically-informed prompts that give the model enough context to produce quality output on the first try.

The problem is that every prompt tool on the market is static. Someone writes a template, ships it, and it never changes. You get the same generic structure whether it is your first day using the tool or your hundredth. The tool has no memory of what worked for you last week, what you tried and abandoned last month, or what your team’s specific patterns look like.

The biggest frustration cited by 66% of developers is dealing with AI solutions that are almost right but not quite. The second biggest is that debugging AI-generated code takes longer than debugging code they wrote themselves.

Both problems trace back to the same root cause. The AI does not know your codebase. It does not know what you tried before. It does not know what your team considers a good solution. Every session starts from scratch.

Via v0.4.0 fixes that.

The Research Behind the Design

Via prompt did not come from thin air. While building it we found a paper published in March 2026 that described almost exactly what we were trying to build.

MemAPO, from a team at Zhejiang University, reconceptualises prompt optimization as generalizable and self-evolving experience accumulation.

It maintains a dual-memory mechanism that distills successful reasoning trajectories into reusable strategy templates while organising incorrect generations into structured error patterns that capture recurrent failure modes.

Given a new prompt, the framework retrieves both relevant strategies and failure patterns to compose prompts that promote effective reasoning while discouraging known mistakes.

That is precisely the architecture Via prompt takes inspiration from. Success patterns on one side, failure patterns on the other, both retrieved to inform the next prompt. MemAPO achieves the best average performance across all datasets while reducing cost by approximately 57.2% compared to the strong baseline TextGrad.

The difference between MemAPO and Via prompt is deployment. MemAPO is a research system evaluated on controlled benchmarks. Via prompt is a production CLI that runs locally, requires zero external dependencies at the base tier, integrates with real coding agent workflows, and stores everything on your machine. The research proved the pattern works. Via ships it.

The full paper is at arxiv.org/abs/2603.21520 and is worth reading if you want to understand the theoretical foundations behind the approach.

What Via Prompt Actually Does

The core idea is simple. Via keeps a local history of every prompt you generate, every outcome you record, and every constraint you add. When you ask for a new prompt, it retrieves the most relevant past patterns and injects them into the generated prompt before you see it.

The terminal output looks like this:

┌─ VIA PROMPT ENGINE ───────────────────────────

│

│ Confidence High 🟢 (12 past tasks, 91% success rate)

│

│ + Context injected:

│ + "add JWT authentication to the Express API..." → success

│ + "implement token refresh middleware..." → success

│

│ - AVOID injected:

│ ⚠ never use Passport.js [global]

│ ⚠ avoid localStorage for auth tokens [global]

│

└───────────────────────────────────────────────

[Generated Prompt — ready for Claude / Codex / Gemini]

You are implementing a feature. Match the existing architecture

and code style exactly. No new dependencies unless explicitly

requested.

implement OAuth login for the API

Cli tool interface:

The Architecture — Five Components, Zero Required Dependencies

Via prompt is built around five components. Everything works out of the box with no external dependencies. Everything gets better as you add VEKTOR or an LLM API key.

Storage is a JSON file at ~/.via/prompts.json by default. It upgrades to SQLite automatically when your history exceeds 500 records. It upgrades to VEKTOR Slipstream if you have it installed. The user never configures this — Via detects and upgrades silently.

Retrieval uses a pure JavaScript BM25 implementation with Porter stemming. No native binaries, no external packages, no installation friction. BM25 is meaningfully better than keyword matching — it handles partial matches, handles stemming (so “authentication” finds “authenticate” and “auth”), and scores by term frequency weighted against document length. If VEKTOR is installed, retrieval upgrades to BM25 plus semantic vector search fused via Reciprocal Rank Fusion.

Assembly takes retrieved success patterns, retrieved failure patterns, the AVOID store, and the new task, and builds a structured prompt with six sections: SYSTEM, GOAL, CONTEXT, PATTERNS THAT WORKED, AVOID, and SUCCESS CRITERIA. Without an LLM API key, assembly is template-based and deterministic. With a key (any of Anthropic, OpenAI, Groq, or local Ollama), the LLM refines the assembled template into a coherent, well-written prompt.

Feedback capture is a single command: via prompt --learn success, via prompt --learn correction --note "what was wrong", or via prompt --learn revert. Optional git hooks capture success automatically when you commit and prompt you on revert.

Export writes the accumulated intelligence into whatever format your tools need. via prompt --export claude writes a CLAUDE.md block that every Claude Code session loads automatically. via prompt --export yaml produces a diffable YAML file you can commit to your project repo so the whole team starts from your learned patterns. The AVOID Store — The Feature Nobody Else Has

Every AI tool helps you do things. None of them remember what you tried and abandoned.

Via’s AVOID store is a persistent list of constraints that gets injected into every generated prompt automatically. Each entry has a constraint, a reason, a scope, and a decay counter.

Scope matters. A global constraint like “never use Passport.js” applies to every auth-related task forever. A file-scope constraint like “do not use callbacks in user.js” only injects when the current task involves that file. A directory-scope constraint applies to a specific module.

Decay prevents the AVOID store from growing forever. If a constraint has not been relevant in 30 tasks, it gets archived — still searchable but no longer auto-injected. Global constraints never decay. This prevents what the research literature calls attention collapse, where an over-constrained LLM gets so focused on what not to do that it fails to write the actual feature.

The AVOID store is the most defensible part of Via prompt. Generic skills packages can copy the template structure. They cannot copy six months of your team’s specific failures.

JIT Abstraction — Rules That Write Themselves

One of the hardest problems in prompt memory systems is that raw records do not scale. Injecting “fix null pointer in user.js” verbatim into a prompt about a different bug is more distracting than helpful. You want the general rule, not the specific instance.

Via solves this with Just-In-Time abstraction. When retrieval pulls five similar past records, Via sends them to the LLM with a simple instruction: extract one general rule that would improve performance on the current task. The abstraction is ephemeral — it exists only for this prompt session. If the user records a success outcome, the abstraction gets permanently promoted to the generic patterns store. If the outcome is correction or revert, the abstraction is discarded and the raw records remain untouched.

This prevents hallucinated rules from polluting the system permanently. Bad abstractions get discarded. Good ones compound. After a few months the generic patterns store contains real distilled knowledge from real task history, not guesses.

Task-Type Aware Token Budgets

Context windows are finite. If Via injects five success patterns, three failure patterns, architecture context, and an AVOID list without any budget management, it blows up the agent’s context window before the actual task gets enough space.

Via allocates tokens differently depending on the task type.

For debug tasks, 40% of the token budget goes to failure patterns and AVOID constraints. That is where the signal is when you are fixing a bug — you want to know what was tried and failed, not success stories from unrelated features.

For implement tasks, 40% goes to success patterns. You want the model to see what good implementation looks like in this codebase and match it.

For review tasks, 50% goes to context and standards. The model needs to know your team’s conventions, not just past task outcomes.

The allocations ship with Via and get refined automatically as the system learns which budget splits produce the best outcomes for your specific workflow.

Everything Else in v0.4.0

Via prompt is the flagship feature but v0.4.0 shipped four other significant upgrades.

via memory now supports hybrid search. via memory search "query" --hybrid fuses BM25 keyword search with VEKTOR semantic search using Reciprocal Rank Fusion. via memory sync pushes all stored facts to VEKTOR for semantic recall. Neither requires VEKTOR to be installed — both degrade gracefully to BM25-only if it is not present.

via task now has a team-shared board. via task board shows a kanban view with OPEN, IN PROGRESS, and DONE columns. via task share exports the board to .via-board.json in the project root. Commit that file to Git and teammates run via task sync to pull the latest board into their local SQLite. Zero infrastructure. Zero cost. File-based team coordination that works with any Git workflow.

via diff --live streams two AI tool responses simultaneously in the terminal. Run via diff --live "explain async/await" --tools claude,openai and both responses stream side by side. Results save to the local database for comparison history.

via convert --batch converts entire folder trees. via convert --batch ./docs --to md walks the directory recursively, shows a progress bar, skips already-converted files by default, and routes each file to the right converter — ImageMagick for images, FFmpeg for audio and video, Pandoc or LibreOffice for documents.

The Compounding Effect

The reason Via prompt works is not any individual feature. It is the flywheel.

Week one: Via generates context-enriched prompts. Useful. Not dramatically different from a well-written manual prompt.

Week four: Via has seen 50 tasks. It knows which prompt structures produced clean first-pass results. The prompts it generates are noticeably more precise. The AVOID store has real entries from real failures.

Month three: Via has failure patterns for every major subsystem. Success templates for the task types you run most often. Generated prompts rarely need correction because the system has learned your specific patterns.

Month six: The learned patterns live in CLAUDE.md. Every session in every tool starts with this context automatically. Via has encoded six months of institutional knowledge into a file that any agent reads on startup.

Static skills packages cannot replicate this. Skills are fixed at the moment someone writes them. Via grows every session. The research backing this pattern is solid. MemAPO, published March 2026, showed that reconceptualising prompt optimization as self-evolving experience accumulation outperforms static prompt templates across every task category they tested. SEW, published April 2026, showed that self-evolving workflows produce up to 12% improvement on coding benchmarks versus using the backbone LLM alone.

Via is the production implementation of those research insights. Local-first, zero-dependency at the base tier, no cloud required, and getting smarter every time you use it.

Via v0.4.0 is available now.

npm install -g @vektormemory/via

via prompt "your first task here"

Source and documentation at vektormemory.com.

The Vektor Memory Ecosystem

Via is one part of a broader set of tools built around the same principle — your AI tools should remember things, and that memory should belong to you.

VEKTOR Slipstream — Persistent memory SDK for AI agents. Local SQLite, 8ms recall, 79.0% on LongMemEval (12 points above full-context GPT-4). npm install -g vektor-slipstream

Give any AI agent persistent memory that survives across sessions, restarts, and model switches

Drop it into Claude Desktop, Claude Code, or any MCP-compatible tool as a zero-config MCP server

Store decisions, facts, code patterns, and architectural choices that your agent recalls automatically next session

Search your memory with BM25 plus semantic vector search fused via RRF — finds what you need even when the vocabulary differs

Build a temporal index of your project history — what changed when, what was decided and why

Extract named entities and traverse the knowledge graph across related memories

Run vektor_store after each session, vektor_recall at the start of the next — your agent picks up exactly where it left off

Benchmark-verified at 79.0% on LongMemEval across 105 questions averaging 344 stored memories each — beats full-context GPT-4, Mem0, ReadAgent, and MemGPT

Works entirely offline, zero cloud dependency, your data never leaves your machine

VEX — Cross-standard vector database migration and memory portability. 12 connectors, Apache 2.0. npm install -g @vektormemory/vex

Migrate your entire memory between any two vector stores in one command — Pinecone, Qdrant, ChromaDB, Weaviate, pgvector, Redis, Milvus, Neo4j, VEKTOR

Import your full Claude conversation history with LLM fact extraction — turns chat logs into structured, searchable memories

Import ChatGPT conversation exports the same way — bring your history with you when you switch models

Extract facts with importance scoring, deduplication, and tag classification using any LLM provider (Groq, OpenAI, Anthropic, Ollama, Mistral)

Convert memory exports to OpenAI fine-tuning format, Anthropic Messages format, or plain text transcripts

Sign exports with BLAKE3 plus Ed25519 for tamper-evident transfer between systems

Back up your entire memory to any Git host with vex sync — GitHub, Codeberg, or self-hosted Gitea

Encryption is AES-256-GCM client-side before anything leaves your machine — the Git host sees opaque ciphertext only

The key is derived from your machine ID plus token hash and never transmitted — you own it completely

Restore your full memory on a new machine in under a minute with vex sync pull

Via — Universal AI tool integration layer. Works everywhere your agents work. npm install -g @vektormemory/via

Generate historically-informed prompts that get smarter every session with via prompt

Store and search facts across all your AI tools with relationship-aware codebase indexing

Run a team-shared task board backed by SQLite, shareable via a single Git-committed JSON file

Convert any file locally — images, audio, video, documents — with via convert, nothing uploaded anywhere

Convert entire folder trees recursively with via convert --batch, progress bar included

Compare two AI tools side by side in real time with via diff --live, both responses streaming simultaneously

Export your accumulated prompt intelligence to CLAUDE.md, YAML, Codex config, or Gemini TOML — one source, every surface Install optional git hooks that capture prompt outcomes automatically on commit and revert

Wire Via into Claude Desktop, Cursor, and Windsurf in one command with via init

Run Via as an MCP server so any MCP-compatible agent can access your memory, tasks, and prompt history

Vek-Sync — MCP configuration sync. Keeps your MCP server setup in sync across every AI editor from a single source of truth. Open source. github.com/Vektor-Memory/Vek-Sync

Define your MCP servers once in a single config file and sync to Claude Desktop, Cursor, Windsurf, VS Code, Cline, and any other MCP-compatible editor automatically

Stop maintaining twelve separate config files for three MCP servers across four editors — one file, one command, everywhere updated

Version control your MCP configuration in Git alongside your project — config changes are diffable, reviewable, and rollbackable

New team member joins: clone the repo, run Vek-Sync, every MCP server appears in every tool instantly

Switch editors without losing your MCP setup — your memory tools, filesystem access, and API connections follow you

Works with any MCP server including VEKTOR Slipstream, GitHub, filesystem, and any custom server you have configured

Treats MCP configuration as infrastructure — the same discipline you apply to .env files and docker-compose.yml, applied to your AI tool layer

Zero cloud, zero account, plain JSON files synced by a local script

VEKTOR Notes — Local-first note-taking app with persistent AI memory built in. Available on Android (Google Play internal testing, iOS coming).

Write notes that your AI agent can recall across sessions — every note stored in the VEKTOR memory graph automatically

Search your notes with the same BM25 plus semantic recall that powers VEKTOR Slipstream — finds what you meant, not just what you typed

JOT Collab built in — four seconds after you stop typing, it surfaces a relevant insight, a gap suggestion, and four arXiv papers from the literature

Cross-session memory — start a new writing session on the same topic and it surfaces what you noticed last time

Export any note as structured markdown with APA citations, ready to paste into a Medium draft Build article drafts from your notes with one tap — eight-section structure generated from your accumulated thoughts and research

Runs entirely on your device, zero cloud dependency, your notes and memories stay local

Connects to your VEKTOR Slipstream memory graph — notes you take on mobile are recalled by your desktop agents automatically

All tools are local-first. No cloud required. $9 monthly subscription for the core functionality in Vektor Memory Slipstream with Cloak tools per month. Your data stays on your machine.

Via, Vek, Vex, are all Open Source and built by Vektor Memory. vektormemory.com

Open Source

Prompt Engineering

Github

LLM

source & further reading

dev.to — original article Streaming vs JSON: Trade-offs in AI-Powered Apps Stop Trading Like It's 1999 — I Built an Autonomous, Vision-Capable Crypto Bot with Python 3.13 ratatop day 2: the memory box, and the lie in `free -h`

The self-improving prompt engine that learns from your codebase history

Run your AI side-project on zahid.host