cd /news/ai-agents/an-ai-agent-that-must-produce-eviden… · home topics ai-agents article
[ARTICLE · art-25410] src=github.com pub= topic=ai-agents verified=true sentiment=↑ positive

An AI agent that must produce evidence before it can say "done"

Distill, a self-hosted AI agent, now requires physical evidence—such as a written file, a passing test, or a running service—before it can mark a task as complete, using a task contract system to prevent unsubstantiated claims. The agent learns across projects by distilling reusable skills from its own experience, versioning each skill and automatically rolling back any that cause measurable regression. It maintains persistent hybrid memory and communicates through Telegram, Discord, Slack, email, a streaming control panel, or the terminal.

read7 min publishedJun 12, 2026

The self-hosted AI agent that can't say "done" without proof.

Every Distill run is governed by a task contract: before finishing, the agent must produce physical evidence — a file written, a test passing, a service answering on its port. No evidence, no "done".

Distill also learns your projects across sessions, distills its own reusable skills from experience (each one versioned and rolled back automatically if a newer version measurably regresses), keeps persistent hybrid memory, and reaches you over Telegram, Discord, Slack, email, a real-time streaming control panel, or the terminal.

Launch the interactive terminal interface with:

distill
distill tui
graph TD
    %% External Interfaces
    UI[Control Panel UI <br/>React + Tailwind]
    Adapters[Messaging Adapters <br/>Telegram, Discord, Slack, Email]
    TUI[Terminal UI]

    %% Gateway & Concurrency
    subgraph Gateway Layer
        API[FastAPI Gateway]
        WS[WebSocket Stream]
        Queue[Session FIFO Queue]
        
        API --- WS
        WS --> Queue
    end

    UI --> API
    Adapters --> API
    TUI --> WS

    %% Core Agent Engine
    subgraph Core Agent Engine
        Agent[Agent Engine <br/>ReAct Loop]
        Contract[Task Contract System <br/>Evidence Gating]
        Plan[Plan Management <br/>Action Ledger]
        Eval[Skill Distiller <br/>Evaluator]
        
        Agent <--> Contract
        Agent <--> Plan
        Agent --> Eval
    end

    Queue --> Agent

    %% State & Memory
    subgraph State & Memory
        Checkpoint[(SQLite Checkpointer <br/>State & History)]
        Chroma[(ChromaDB <br/>Semantic Memory)]
        Neo4j[(Neo4j <br/>Graph Memory)]
        
        Agent <--> Checkpoint
        Agent <--> Chroma
        Agent <--> Neo4j
    end

    %% External Services
    LLM((LLM Provider <br/>LiteLLM))
    Agent <--> LLM

    %% Tooling & Execution
    subgraph Tooling & Execution Sandbox
        Tools[Tool Manager]
        MCP[MCP Servers]
        Sandbox[Terminal Sandbox <br/>Docker / Host / Serverless]
        Skills[Evolved Skills <br/>Auto-Maker]
        
        Agent --> Tools
        Tools --> MCP
        Tools --> Sandbox
        Tools --> Skills
        Eval -.->|Synthesizes & Validates| Skills
    end
src/
  agent.py          -- ReAct loop, session management, checkpointing
  contract.py       -- Task-contract system (evidence tracking)
  evaluator.py      -- Skill distiller: trajectory -> reusable MCP skill
  memory.py         -- HybridMemory: ChromaDB (semantic) + Neo4j (graph)
  gateway.py        -- FastAPI app, WebSocket stream, session-per-FIFO-lane
  tools.py          -- MCP servers, terminal, file, process, port tools
control-panel/      -- React + Tailwind chat UI with live token streaming

The reasoning behind these decisions — and the alternatives that were rejected — is documented in docs/DESIGN.md.

Task-Contract Execution: The agent must declare the required execution evidence (files created, services running) before starting a task. The final response is gated on this physical evidence, eliminating "I'll do it now" hallucinations.Skill Distillation: After a successful complex task, an LLM evaluates the trajectory and synthesizes a parameterized Python tool. New skills are versioned, validated, and automatically rolled back if their success rate drops.Session-per-FIFO-Lane Concurrency: Every user session gets a dedicated queue and worker task, allowing high concurrency with strict message ordering.** Hybrid Memory**: Combines SQLite full-text search, ChromaDB semantic embeddings, and Neo4j graph relationships to recall cross-session context.Universal Sandboxing: Run shell operations locally, in Docker, or remotely through a single HTTP exec shim that can front any serverless sandbox (Daytona, E2B, Modal, …).Shareable Skills: Export and import distilled skills via the openSKILL.md

format.

Distill is a research framework with a stable core and a set of more experimental capabilities around it. This matrix shows where each subsystem stands — treat anything marked Experimental as subject to change.

Subsystem Maturity Notes
FastAPI gateway (auth, rate-limit, session FIFO lanes) Stable
Token-gated; returns 503 until AGENT_API_TOKEN is set.
Contract / evidence-gated ReAct loop Stable
Core differentiator; final answer gated on physical evidence.
SQLite checkpointer + session store Stable
Durable per-session state and history.
Local & Docker sandbox Stable
Default execution paths.
Serverless sandbox (HTTP exec shim) Experimental
Opt-in via AGENT_SANDBOX=http ; one endpoint contract fronts any provider (Daytona, E2B, Modal, …).
ChromaDB semantic + Neo4j graph memory Optional
Degrade gracefully when the backends are absent.
Skill distillation & evolution Experimental
Auto-synthesised skills are versioned and auto-rolled-back on regression.
Sub-agent delegation (delegate_task )
Experimental
Bounded delegated execution — see

Adapter-tested Spin up the agent gateway in the cloud, add your LLM API key (and optionally a Telegram bot token), and text your agent. Runs lean — SQLite-backed memory, no database to provision.

Render reads, generates a gateway token for you, and prompts for your provider key + models.render.yaml

Railway uses: create a project from this repo, then setrailway.json

AGENT_API_TOKEN

,AGENT_MODEL

, and your provider key.Fly.io:fly launch --copy-config

against, thenfly.toml

fly secrets set …

(commands are in the file header).

./scripts/quickstart.sh

Generates strong secrets, creates your env file, and brings the full stack up with Docker Compose. Add your LLM API key to an-api.env

and re-run.

Manual Docker Compose #

cp an-api.env.example an-api.env


echo "NEO4J_PASSWORD=$(openssl rand -hex 24)" >> .env

docker compose up -d --build

Control Panel (UI):http://localhost:5173

API / Docs:http://localhost:8000/docs

The bootstrap scripts install the prerequisites for you, then hand off to the interactive installer. They are the recommended path on a fresh PC.

Windows (PowerShell):

irm https://raw.githubusercontent.com/Aspct3434/Distill-Agent/master/scripts/bootstrap.ps1 | iex

macOS / Linux (bash):

curl -fsSL https://raw.githubusercontent.com/Aspct3434/Distill-Agent/master/scripts/bootstrap.sh | bash

You can also clone the repo first and run scripts/bootstrap.ps1

(Windows) or scripts/bootstrap.sh

(macOS/Linux) directly.

Run the interactive installer with npx

(no global install required):

npx @aspct/distill-agent install

Prefer pinning to the exact GitHub revision? Use the repo form:

npx --yes github:Aspct3434/Distill-Agent install

After installation, use the CLI to manage the agent:

npm i -g @aspct/distill-agent
distill                           # Open the interactive terminal UI
distill start                     # Start the backend and control panel
distill logs                      # View running logs
distill update                    # Pull the latest changes
distill doctor                    # Diagnose the install and environment

npx @aspct/distill-agent start    # Start the backend and control panel
npx @aspct/distill-agent logs     # View running logs
npx @aspct/distill-agent update   # Pull the latest changes
npx @aspct/distill-agent doctor   # Diagnose the install and environment

Distill is configured via environment variables. Key settings in an-api.env

:

AGENT_MODEL

accepts any LiteLLM provider string. The installers can preconfigure Kimi/Moonshot, Ollama, OpenRouter, OpenAI, Anthropic, Gemini, DeepSeek, Groq, xAI Grok, Mistral, or any OpenAI-compatible endpoint (vLLM) — set the matching *_API_KEY

in an-api.env

.

For OpenAI you can also sign in with your ChatGPT account (Codex OAuth) instead of pasting a key: choose it in the installer, or run PYTHONPATH=src python -m auth login

(also available in the control panel under Settings → Authentication). The token is stored locally, injected as OPENAI_API_KEY

, and refreshed automatically before LLM calls.

Variable Default Description
AGENT_MODEL
moonshot/kimi-k2.5
The primary LLM to use (supports any LiteLLM provider string).
AGENT_SANDBOX
(blank) Leave blank for Docker Compose. Set to docker to run Docker-in-Docker, or http for serverless.
AGENT_REQUIRE_APPROVAL
off
Set to risky or all to require human-in-the-loop approval before executing shell commands.
AGENT_API_TOKEN
required Bearer token for API/WebSocket access; agent endpoints return 503 until set.
AGENT_ALLOW_INSECURE_NO_AUTH
false
Explicit local-only override for running without API auth.
GATEWAY_RATE_LIMIT_RPM
60
Per-client request limit, keyed by API token or client IP.
AGENT_LOG_DB_PATH
./data/gateway_logs.db
Persistent SQLite log store used by /api/logs .

Adapters are disabled by default and activate when you provide a bot token in an-api.env

:

Telegram: SetTELEGRAM_BOT_TOKEN

. Supports voice note transcriptions via Whisper.Discord: SetDISCORD_BOT_TOKEN

.Slack: SetSLACK_BOT_TOKEN

&SLACK_APP_TOKEN

(Socket Mode).Email: SetEMAIL_ADDRESS

,EMAIL_PASSWORD

,EMAIL_IMAP_HOST

,EMAIL_SMTP_HOST

.

Each channel provides a live typing indicator, streams real-time tool execution logs, and isolates conversations.

The test suite covers contracts, planners, adapters, and the ReAct loop:

pytest                        # Run the full suite
pytest tests/test_task_contract_loop.py -v   # Test the anti-hallucination contract system

Distill is released under the MIT License.

── more in #ai-agents 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/an-ai-agent-that-mus…] indexed:0 read:7min 2026-06-12 ·