An AI agent that must produce evidence before it can say "done"

wpnews.pro

The self-hosted AI agent that can't say "done" without proof.

Every Distill run is governed by a task contract: before finishing, the agent must produce physical evidence — a file written, a test passing, a service answering on its port. No evidence, no "done".

Distill also learns your projects across sessions, distills its own reusable skills from experience (each one versioned and rolled back automatically if a newer version measurably regresses), keeps persistent hybrid memory, and reaches you over Telegram, Discord, Slack, email, a real-time streaming control panel, or the terminal.

Launch the interactive terminal interface with:

distill
distill tui
graph TD
    %% External Interfaces
    UI[Control Panel UI <br/>React + Tailwind]
    Adapters[Messaging Adapters <br/>Telegram, Discord, Slack, Email]
    TUI[Terminal UI]

    %% Gateway & Concurrency
    subgraph Gateway Layer
        API[FastAPI Gateway]
        WS[WebSocket Stream]
        Queue[Session FIFO Queue]
        
        API --- WS
        WS --> Queue
    end

    UI --> API
    Adapters --> API
    TUI --> WS

    %% Core Agent Engine
    subgraph Core Agent Engine
        Agent[Agent Engine <br/>ReAct Loop]
        Contract[Task Contract System <br/>Evidence Gating]
        Plan[Plan Management <br/>Action Ledger]
        Eval[Skill Distiller <br/>Evaluator]
        
        Agent <--> Contract
        Agent <--> Plan
        Agent --> Eval
    end

    Queue --> Agent

    %% State & Memory
    subgraph State & Memory
        Checkpoint[(SQLite Checkpointer <br/>State & History)]
        Chroma[(ChromaDB <br/>Semantic Memory)]
        Neo4j[(Neo4j <br/>Graph Memory)]
        
        Agent <--> Checkpoint
        Agent <--> Chroma
        Agent <--> Neo4j
    end

    %% External Services
    LLM((LLM Provider <br/>LiteLLM))
    Agent <--> LLM

    %% Tooling & Execution
    subgraph Tooling & Execution Sandbox
        Tools[Tool Manager]
        MCP[MCP Servers]
        Sandbox[Terminal Sandbox <br/>Docker / Host / Serverless]
        Skills[Evolved Skills <br/>Auto-Maker]
        
        Agent --> Tools
        Tools --> MCP
        Tools --> Sandbox
        Tools --> Skills
        Eval -.->|Synthesizes & Validates| Skills
    end
src/
  agent.py          -- ReAct loop, session management, checkpointing
  contract.py       -- Task-contract system (evidence tracking)
  evaluator.py      -- Skill distiller: trajectory -> reusable MCP skill
  memory.py         -- HybridMemory: ChromaDB (semantic) + Neo4j (graph)
  gateway.py        -- FastAPI app, WebSocket stream, session-per-FIFO-lane
  tools.py          -- MCP servers, terminal, file, process, port tools
control-panel/      -- React + Tailwind chat UI with live token streaming

The reasoning behind these decisions — and the alternatives that were rejected — is documented in docs/DESIGN.md.

Task-Contract Execution: The agent must declare the required execution evidence (files created, services running) before starting a task. The final response is gated on this physical evidence, eliminating "I'll do it now" hallucinations.Skill Distillation: After a successful complex task, an LLM evaluates the trajectory and synthesizes a parameterized Python tool. New skills are versioned, validated, and automatically rolled back if their success rate drops.Session-per-FIFO-Lane Concurrency: Every user session gets a dedicated queue and worker task, allowing high concurrency with strict message ordering.** Hybrid Memory**: Combines SQLite full-text search, ChromaDB semantic embeddings, and Neo4j graph relationships to recall cross-session context.Universal Sandboxing: Run shell operations locally, in Docker, or remotely through a single HTTP exec shim that can front any serverless sandbox (Daytona, E2B, Modal, …).Shareable Skills: Export and import distilled skills via the openSKILL.md

format.

Distill is a research framework with a stable core and a set of more experimental capabilities around it. This matrix shows where each subsystem stands — treat anything marked Experimental as subject to change.

Subsystem	Maturity	Notes
FastAPI gateway (auth, rate-limit, session FIFO lanes)	Stable
Token-gated; returns 503 until `AGENT_API_TOKEN` is set.
Contract / evidence-gated ReAct loop	Stable
Core differentiator; final answer gated on physical evidence.
SQLite checkpointer + session store	Stable
Durable per-session state and history.
Local & Docker sandbox	Stable
Default execution paths.
Serverless sandbox (HTTP exec shim)	Experimental
Opt-in via `AGENT_SANDBOX=http` ; one endpoint contract fronts any provider (Daytona, E2B, Modal, …).
ChromaDB semantic + Neo4j graph memory	Optional
Degrade gracefully when the backends are absent.
Skill distillation & evolution	Experimental
Auto-synthesised skills are versioned and auto-rolled-back on regression.
Sub-agent delegation (`delegate_task` )
Experimental
Bounded delegated execution — see

Adapter-tested Spin up the agent gateway in the cloud, add your LLM API key (and optionally a Telegram bot token), and text your agent. Runs lean — SQLite-backed memory, no database to provision.

Render reads, generates a gateway token for you, and prompts for your provider key + models.render.yaml

Railway uses: create a project from this repo, then setrailway.json

AGENT_API_TOKEN

,AGENT_MODEL

, and your provider key.Fly.io:fly launch --copy-config

against, thenfly.toml

fly secrets set …

(commands are in the file header).

./scripts/quickstart.sh

Generates strong secrets, creates your env file, and brings the full stack up with Docker Compose. Add your LLM API key to an-api.env

and re-run.

Manual Docker Compose #

cp an-api.env.example an-api.env


echo "NEO4J_PASSWORD=$(openssl rand -hex 24)" >> .env

docker compose up -d --build

Control Panel (UI):http://localhost:5173

API / Docs:http://localhost:8000/docs

The bootstrap scripts install the prerequisites for you, then hand off to the interactive installer. They are the recommended path on a fresh PC.

Windows (PowerShell):

irm https://raw.githubusercontent.com/Aspct3434/Distill-Agent/master/scripts/bootstrap.ps1 | iex

macOS / Linux (bash):

curl -fsSL https://raw.githubusercontent.com/Aspct3434/Distill-Agent/master/scripts/bootstrap.sh | bash

You can also clone the repo first and run scripts/bootstrap.ps1

(Windows) or scripts/bootstrap.sh

(macOS/Linux) directly.

Run the interactive installer with npx

(no global install required):

npx @aspct/distill-agent install

Prefer pinning to the exact GitHub revision? Use the repo form:

npx --yes github:Aspct3434/Distill-Agent install

After installation, use the CLI to manage the agent:

npm i -g @aspct/distill-agent
distill                           # Open the interactive terminal UI
distill start                     # Start the backend and control panel
distill logs                      # View running logs
distill update                    # Pull the latest changes
distill doctor                    # Diagnose the install and environment

npx @aspct/distill-agent start    # Start the backend and control panel
npx @aspct/distill-agent logs     # View running logs
npx @aspct/distill-agent update   # Pull the latest changes
npx @aspct/distill-agent doctor   # Diagnose the install and environment

Distill is configured via environment variables. Key settings in an-api.env

:

AGENT_MODEL

accepts any LiteLLM provider string. The installers can preconfigure Kimi/Moonshot, Ollama, OpenRouter, OpenAI, Anthropic, Gemini, DeepSeek, Groq, xAI Grok, Mistral, or any OpenAI-compatible endpoint (vLLM) — set the matching *_API_KEY

in an-api.env

.

For OpenAI you can also sign in with your ChatGPT account (Codex OAuth) instead of pasting a key: choose it in the installer, or run PYTHONPATH=src python -m auth login

(also available in the control panel under Settings → Authentication). The token is stored locally, injected as OPENAI_API_KEY

, and refreshed automatically before LLM calls.

Variable	Default	Description
`AGENT_MODEL`
`moonshot/kimi-k2.5`
The primary LLM to use (supports any LiteLLM provider string).
`AGENT_SANDBOX`
(blank)	Leave blank for Docker Compose. Set to `docker` to run Docker-in-Docker, or `http` for serverless.
`AGENT_REQUIRE_APPROVAL`
`off`
Set to `risky` or `all` to require human-in-the-loop approval before executing shell commands.
`AGENT_API_TOKEN`
required	Bearer token for API/WebSocket access; agent endpoints return 503 until set.
`AGENT_ALLOW_INSECURE_NO_AUTH`
`false`
Explicit local-only override for running without API auth.
`GATEWAY_RATE_LIMIT_RPM`
`60`
Per-client request limit, keyed by API token or client IP.
`AGENT_LOG_DB_PATH`
`./data/gateway_logs.db`
Persistent SQLite log store used by `/api/logs` .

Adapters are disabled by default and activate when you provide a bot token in an-api.env

:

Telegram: SetTELEGRAM_BOT_TOKEN

. Supports voice note transcriptions via Whisper.Discord: SetDISCORD_BOT_TOKEN

.Slack: SetSLACK_BOT_TOKEN

&SLACK_APP_TOKEN

(Socket Mode).Email: SetEMAIL_ADDRESS

,EMAIL_PASSWORD

,EMAIL_IMAP_HOST

,EMAIL_SMTP_HOST

.

Each channel provides a live typing indicator, streams real-time tool execution logs, and isolates conversations.

The test suite covers contracts, planners, adapters, and the ReAct loop:

pytest                        # Run the full suite
pytest tests/test_task_contract_loop.py -v   # Test the anti-hallucination contract system

Distill is released under the MIT License.

source & further reading

github.com — original article

An AI agent that must produce evidence before it can say "done"

Manual Docker Compose #

Run your AI side-project on zahid.host