A coding agent that plans with one model, executes with another, and remembers what it learns.
Most coding agents are one model in a tool loop. Raidho splits the work: use a smart, expensive model to reason and plan, a cheap, fast model to execute, and a durable memory that carries facts across runs β all provider-agnostic, with your own API key.
The name is the rune
Raidho(α±) β "journey / movement".
Status: alpha.Tested end-to-end live against both backends (DeepSeek and Claude through the official Anthropic SDK, including the agentic tool-loop). A reproducible real-API benchmark ships inbenchmarks/real_task_opus.py
with full evidence (evidence/2026-06-11_opus_vs_raidho/
): same task, same model β deterministic procedure $0.05 / context-first hybrid $0.116 / pure tool-loop $0.301; the hybrid matched the loop's report quality at Γ2.6 less cost. APIs may change before 1.0.
Reasoning β execution.text
mode (reasoning, no tools) andcode
mode (agentic tool loop) can run ondifferent providers. Plan on Claude, grind on DeepSeek β you choose where the expensive thinking happens and where the cheap doing happens.** Council mode.**Have two providersdebatea question and a neutral pass distill the consensus (points of agreement, residual disagreements, recommendation) β e.g. Claude vs DeepSeek. Depersonalized and provider-pluggable; no built-in personas.Durable, structural memory β persists across runs. The agent remembers(subject, relation, object)
facts and recalls the relevant ones into its prompt each turn; it saves new ones itself via aremember
tool, andcouncil verdicts are distilled into facts automatically. Memory is written to disk per project (<workdir>/.raidho/memory
) and reloaded next run β so a decision reached today resurfaces tomorrow, recalled only when relevant (cheap; no history bloat) and across languages (a Russian query finds an English fact). It's a Vector Symbolic Architecture (VSA), not RAG: facts are composed algebraically, similarity is bit-packed (32Γ less RAM than float, identical ranking). You don't need to know any of that to use it β seedocs/MEMORY.mdif you want to.Gets cheaper with repetition (opt-in). Turn on auto-distillation and a successful read-only tool-loop is captured as a deterministic procedure: the next similar task replaces the multi-iteration LLM loop with deterministic data-collection + one synthesis call. Heavily gated for safety (read-only commands and pipelines only, a safety-verify pass, neutral fitness that sinks a bad procedure; writes always stay on the LLM path). Measured live (deepseek-chat,evidence/2026-06-12_autodistill_curve/
): the win scales withiteration overhead, not task size β a repeated multi-step task over small data droppedΓ9.6 per repeat (70% over 5 runs), while a data-heavy audit (cost dominated by file contents, few iterations to cut) saved ~nothing. Honest rule: it removes repeated per-iteration context cost, not the cost of the data itself.Tiny and hackable. The memory core depends only onnumpy
; the whole agent is a handful of files. Swap providers, tools, or the embedder without fighting a framework.Bring your own key. Claude (default), DeepSeek, OpenAI, or any OpenAI-compatible endpoint.
Guided (recommended) β one interactive script that explains every step, verifies your API key live, runs a real smoke test and shows how to use the agent (concept: MavKa by MozgAI):
bash install.sh
Manual:
pip install -e '.[anthropic]' # Claude backend (official Anthropic SDK)
pip install -e '.[openai-compat]' # DeepSeek / OpenAI-compatible (httpx)
pip install -e '.[embed]' # semantic memory (sentence-transformers)
pip install -e '.[dev]' # + pytest
Python β₯ 3.11.
export CODER_PROVIDER=deepseek
export DEEPSEEK_API_KEY=sk-...
coder "create a FastAPI hello-world app and run it"
export CODER_PROVIDER=deepseek # execution (code mode, tool loop)
export DEEPSEEK_API_KEY=sk-...
export CODER_REASON_PROVIDER=anthropic # reasoning (text mode)
export ANTHROPIC_API_KEY=sk-ant-...
coder # REPL: /text plans on Claude, /code executes on DeepSeek
The expensive model is used only where it earns its keep; the token-heavy tool loop runs on the cheap one.
coder # interactive REPL (default mode: code)
coder "<task>" # headless: run one task, print result, exit
In the REPL: /code
agentic coding, /text
reasoning chat, /ctx
toggle
context-first, /learn
toggle auto-distill, /council <question>
two-provider debate β consensus, /quit
to
exit. Memory persists per project at <workdir>/.raidho/memory
β the REPL shows how many facts it loaded on start.
import asyncio
from agent.providers import get_provider
from agent.loop import Session
from agent.memory import AgentMemory
reason = get_provider({"provider": "anthropic", "api_key": "sk-ant-..."}) # smart
execute = get_provider({"provider": "deepseek", "api_key": "sk-...", # cheap
"model": "deepseek-chat"})
memory = AgentMemory(path=".raidho/memory")
session = Session(execute, workdir=".", memory=memory, reason_provider=reason)
asyncio.run(session.chat("plan how to add auth to this app")) # β reason provider
asyncio.run(session.code("implement the plan and add a test")) # β execution provider
Omit reason_provider
and both modes use the single provider.
from agent.council import Council
council = Council(reason, execute, name_a="claude", name_b="deepseek")
result = await council.consensus("pin exact deps or use ranges?", rounds=2)
print(result["verdict"]) # points of agreement / residual disagreements / recommendation
res = await session.council("pin exact deps or use ranges?")
print(res["remembered"]) # e.g. [("dependencies", "pinned", "exact")] β recalled later
Or Session(...).council("...")
, which seats reason_provider
vs provider
.
| Variable | Meaning | Default |
|---|---|---|
CODER_PROVIDER |
||
execution provider: anthropic |
deepseek |
openai |
anthropic |
||
CODER_MODEL |
||
| override execution model | provider default | |
CODER_REASON_PROVIDER |
||
optional separate provider for text /reasoning |
||
= CODER_PROVIDER |
||
CODER_REASON_MODEL |
||
| reasoning model | provider default | |
CODER_BASE_URL |
||
endpoint URL for openai-compat |
||
| β | ||
CODER_CONTEXT_FIRST |
||
1 packs the workspace into the first call (fewer tool iterations) |
||
| off | ||
CODER_AUTODISTILL |
||
1 learns read-only procedures from successful runs (gets cheaper with repetition) |
||
| off | ||
CODER_MEMORY |
||
memory file path; off disables persistence |
||
<workdir>/.raidho/memory |
||
ANTHROPIC_API_KEY / DEEPSEEK_API_KEY / OPENAI_API_KEY / CODER_API_KEY |
||
API keys (provider-specific first, then CODER_API_KEY ) |
||
| β |
See docs/PROVIDERS.md for adding a provider and the auth hook.
docs/ARCHITECTURE.mdβ components and data flow.docs/MEMORY.mdβ the VSA memory model and bit-packing.docs/OPENWEBUI.mdβ drive Raidho from Open WebUI (chat / council / code as selectable models).
- Broader benchmark coverage (success rate on a task set vs. single-model baseline; SWE-bench-style eval) β a first real-API cost benchmark with evidence is already in
benchmarks/
+evidence/
. - Streaming responses in the Open WebUI plugin (currently the reply lands at once).
Recently shipped: persistent memory across runs Β· council verdicts saved as facts Β· context-first mode Β· auto-picked semantic embedder Β· automatic Open WebUI setup.
The bash
tool runs unsandboxed in the working directory; in code
mode the model decides which commands to run. Use Raidho only on code and tasks you trust β ideally inside a container or a throwaway directory. See SECURITY.md.
Dual-licensed: AGPL-3.0-or-later for open-source / research / non-commercial use, or a commercial license β see COMMERCIAL.md.
See CONTRIBUTING.md. Issues and pull requests welcome.
β the web interface Raidho plugs into. It's an excellent, polished chat UI and a perfect fit for this agent; rather than reinvent it, Raidho ships a Pipe plugin and the installer can wire itself in automatically. Thanks to the Open WebUI team.Open WebUIβ this project's critic throughout its path: his reviews shaped the retry layer, the embedder honesty, the history budget and more. The guided installer (Oles Lytvyn (MozgAI)install.sh
) follows the concept he pioneered inβ an installer that explains everything out of the box ("AI installs itself").MavKa