Governed output gateway for agentic coding tools.
BEAST sits between your AI coding agent (Cursor, Claude Code, VS Code Copilot) and any LLM provider. It governs what goes in and what comes out — enforcing output contracts, repairing non-compliant patches before they touch your filesystem, and learning which tool calls are worth making.
AI coding agents are not careful. They read entire files when they need three lines. They write to paths they shouldn't. They spend your token budget on redundant lookups. When a provider returns malformed JSON, they fail silently or corrupt your code.
BEAST intercepts both sides:
Input governance— context compression, tool laziness learning, budget enforcement, circuit breakers** Output governance**— every model response is parsed against a typed output contract (beast.action_intent.v1
) before anything touches disk. Non-compliant patches are repaired locally and verified. If verification fails, nothing is written.
| Lane | Completed | Median tokens | vs raw |
|---|---|---|---|
| Raw (no BEAST) | 0 / 10 | 47,661 | — |
| Context only | 0 / 10 | 44 | −99.9% |
| RAG | 8 / 10 | 296 | −99.4% |
| RAG + Tools | 10 / 10 | 326 | −99.3% |
| Full BEAST | |||
| 10 / 10 | |||
| 390 | |||
| −99.2% |
Raw context hits the token budget before the model can reason about the scoped problem. BEAST completes 100% of tasks at under 400 tokens, verified by passing pytest suites.
| Result | Count |
|---|---|
| BEAST end-to-end completions | 192 / 192 |
| Clean provider completions | 36 / 192 |
| BEAST-rescued completions | 156 / 192 |
79% of raw provider outputs were non-compliant, malformed, or incomplete. BEAST rescued every one of them. Without output governance, those 156 tasks would have silently failed or written corrupted patches.
| Rank | Provider | Role | Clean | Fitness | Latency |
|---|---|---|---|---|---|
| 1 | ovhcloud |
||||
| candidate patch provider | 5/10 | 0.663 | 14s | ||
| 2 | puter_deepseek |
||||
| candidate patch (high latency) | 4/10 | 0.619 | 13s | ||
| 3 | cohere |
||||
| candidate patch provider | 4/10 | 0.614 | 6.7s | ||
| 4 | deepinfra |
||||
| candidate patch (high latency) | 4/10 | 0.612 | 32s | ||
| 5 | huggingface |
||||
| rescue-backed action IR | 3/10 | 0.583 | 1.6s | ||
| 6 | nscale |
||||
| rescue-backed action IR | 3/10 | 0.581 | 7.8s | ||
| 7 | mistral |
||||
| rescue-backed (Codestral) | 2/10 | 0.545 | 4.1s | ||
| 8 | openrouter |
||||
| fast rescue-backed action IR | 2/10 | 0.544 | 3.8s | ||
| 9 | sambanova |
||||
| fast rescue-backed action IR | 1/10 | 0.512 | 3.0s | ||
| 10 | cloudflare |
||||
| edge / microtask | 1/10 | 0.483 | 2.1s | ||
| 11–14 | cerebras , featherless , nvidia_nim , gemini |
||||
| scout / selector | 0–2/10 | 0.33–0.42 | varies | ||
| 15–16 | groq , llm7 |
||||
| scout only | 0/10 | 0.23 | fast | ||
| 17–18 | aion_labs , novita |
||||
| rate-limited / rescue | 1/10 | 0.39–0.51 | varies | ||
| 19–20 | hyperbolic , fal |
||||
| do not use (auth/billing) | 0/10 | — | — |
Notable findings:
Puter-routed DeepSeek achieved 4 clean passes on a free proxied route — matching paid providers. BEAST can make unconventional free routes production-viable through governance.LLM7 returned valid JSON on 100% of tasks but passed the output schema on only 10%. Without an output governor, it looks like it's working. It isn't.NVIDIA NIM failed the output contract on every task. BEAST repaired and rescued both targeted tasks. Zero silent failures.DeepInfra observed cost: ~$0.000332 per verified, governed code fix.
Coding agent (Cursor / Claude Code / VS Code)
│
▼
┌─────────────────────────────────────────┐
│ BEAST Gateway │
│ │
│ Input side Output side │
│ ───────── ─────────── │
│ Context economy Output contract │
│ Tool laziness Local verifier │
│ Budget ledger Patch compiler │
│ Circuit breakers Anchor resolver │
│ Workspace graph Repair engine │
│ MCP broker Sandbox validator │
│ │
│ Memory: L0 policy → L4 forensic archive│
└─────────────────────────────────────────┘
│
▼
Any LLM provider (20+ tested)
Every model response passes through:
Contract parse— response must conform tobeast.action_intent.v1
Anchor resolution—anchor_ref
fields resolve to exact code locations; no copy-paste writesPath validation— writes outside allowed paths are rejected before compilation** Local patch compile**—ActionIR
→ResolvedAction
→ staged file writesSandbox verification— compiled patches run against pytest before disk commit** Repair**— if verification fails, the local verifier attempts repair before giving up** Forensic record**— every outcome (clean, repaired, rejected) is written to the Chronicle
Provider-specific output profiles handle model quirks: NVIDIA NIM gets refs_only=True
; HuggingFace gets repair_attempts=2
.
| Layer | Name | Contents |
|---|---|---|
| L0 | Meta Rules | Spend caps, shell allowlists, blocked paths — immutable |
| L1 | Insight Index | Session state, cache handles, circuit state |
| L2 | Workspace Graph | Symbol maps, dependency edges, semantic chunks |
| L3 | Skill Tree | Promoted, verified workflows and route cards |
| L4 | Forensic Archive | Append-only Chronicle — every request, every outcome |
git clone https://github.com/Byron2306/EdgeK-BEAST
cd EdgeK-BEAST
pip install -r requirements.txt
Optional (semantic RAG, large ML wheels):
pip install -r requirements-semantic.txt
Optional (LiteLLM proxy support):
pip install -r requirements-litellm.txt
Start the gateway:
uvicorn app.main:app --host 0.0.0.0 --port 8005
Point your coding agent at BEAST instead of your provider directly:
export OPENAI_BASE_URL=http://localhost:8005/v1
export ANTHROPIC_BASE_URL=http://localhost:8005
Set whichever providers you use:
export HF_TOKEN='...'
export HF_INFERENCE_BASE_URL='https://router.huggingface.co/v1'
export OPENROUTER_API_KEY='...'
export GEMINI_API_KEY='...'
export NVIDIA_API_KEY='...'
export COHERE_API_KEY='...'
export MISTRAL_API_KEY='...'
export LOCAL_NIM_BASE_URL='http://localhost:8000/v1'
BEAST will route, govern, and fall back across providers according to the fitness map. Providers you haven't configured are skipped cleanly.
GET /health
GET /edgek/state
GET /ui
POST /v1/chat/completions # OpenAI-compatible
POST /v1/messages # Anthropic-compatible
POST /hf/v1/chat/completions # HuggingFace router
POST /litellm/v1/chat/completions # LiteLLM proxy
POST /edgek/tools/intercept # Semantic tool-call interception
GET /edgek/workspace # Workspace graph state
POST /edgek/workspace/index # Index a repository
GET /edgek/runtime/state
GET /edgek/runtime/attempts
POST /edgek/runtime/circuit-breakers/{provider}/reset
POST /edgek/mcp/evaluate
POST /edgek/mcp/execute
GET /edgek/mcp/audit
GET /edgek/skills/promotion-candidates
POST /edgek/skills/promote
POST /edgek/enterprise/teams
POST /edgek/enterprise/virtual-keys
GET /edgek/enterprise/observability
Full endpoint reference in the API docs.
policies/default.yaml
controls everything:
- Spend caps and token budgets per provider and per team
- Shell command allowlists and blocklists
- File path write restrictions
- MCP server trust levels
- Circuit breaker thresholds
- Tool laziness learning parameters
PYTHONPATH=. python3 benchmarks/run_benchmark.py --lanes all --tasks 10
PYTHONPATH=. python3 benchmarks/run_live_benchmark.py --providers hf,openrouter,cohere
PYTHONPATH=. python3 benchmarks/provider_edge_compare.py --repeats 3
Results are written to benchmarks/results/
.
BEAST generates LiteLLM and Nginx configs directly from your active policy:
PYTHONPATH=. python3 scripts/generate_deploy_configs.py --out deploy/generated
Nginx routes /tool-calls/*
into BEAST's semantic interceptor — file read requests return the top 3 relevant snippets instead of full source files.
See deployment_integrations.md for the full runbook including GitHub tool calls, Postgres integration, and prompt-cache keepalive setup.
- It does not replace your LLM provider. It governs the traffic between your agent and your provider.
- It does not add latency you'll notice for most tasks. Output governance adds microseconds locally; provider latency dominates.
- It does not require a GPU. The entire governance and compilation pipeline runs on CPU.
- It does not phone home. Everything — workspace graph, budget ledger, forensic archive, skill tree — is local SQLite and append-only files.
MIT — see LICENSE.
Active development. Core governance pipeline (input economy + output contracts + local verification) is stable and benchmarked. V2 roadmap focuses on the Chronicle engine, route cards, and skill promotion loop. See BEAST_V2_ROADMAP.md.
Contributions, issues, and provider benchmark results welcome.