# Ludwig Spec Driven Development MCP > Source: > Published: 2026-06-26 04:40:50+00:00 A specification-driven development framework whose specs are written as close to natural language as possible. The same prose-first markdown spec drives LLM code generation **and** verifies the resulting code. Named after Ludwig Wittgenstein. Scope:Ludwig targetsRust. Deterministic checks shell out to`cargo test` , so verifying a project requires a Cargo workspace. The spec format itself is plain markdown, but verification is Rust-specific by design — there is no plan to add adapters for other languages. Today, when a human asks an LLM to write code, the request is a chat message and the contract is whatever the model inferred. The result drifts the moment either side is edited. Ludwig replaces the chat message with a **specification** — a markdown document with a small fixed shape — that is the durable, versioned source of intent. The same document drives generation and verifies the resulting code, so spec and implementation stay coupled. ``` --- id: token-bucket-rate-limiter title: Token-bucket rate limiter status: active implements: - src/rate_limit/token_bucket.rs version: 4 --- ## Intent Protect downstream services from bursty clients by allowing short bursts up to a configured capacity while enforcing a steady average rate. The limiter is a building block for per-tenant API quotas; it is not, by itself, a fairness mechanism between tenants. ## Behavior - {#b1} A limiter is created with `capacity` and `refill_rate` (tokens/sec). - {#b2} `try_acquire(n)` returns true and consumes n tokens iff at least n are available. - {#b3} Tokens refill continuously based on wall-clock time, capped at capacity. ## Examples ``` example name="burst then throttle" Given a limiter with capacity 5 and refill_rate 1/sec When try_acquire(1) is called 5 times in immediate succession Then all 5 calls return true And the 6th call in the same instant returns false ``` ## Invariants - {deterministic} tokens_consumed <= capacity + refill_rate * elapsed_seconds. - {property} after waiting C / refill_rate seconds the limiter is full. - {judgment} Errors surfaced to callers name the limiter and required wait in plain English. ``` Sections are fixed and ordered: **Intent → Behavior → Examples → Invariants** (plus optional **Non-goals**, **Open questions**, **Implementation notes**). The only embedded DSL is `Given/When/Then` inside Examples — Gherkin-shaped so LLMs already know it. Ludwig supports two complementary directions: **1. Description → specs.** Start from a project or feature description; the host agent decomposes it into specs and drafts each one; Ludwig validates and persists; the human reviews before any implementation. ``` echo "A URL shortener with per-tenant analytics" | ludwig decompose # inside Claude Code: /project-decompose A URL shortener with per-tenant analytics # agent emits JSON, then for each game/spec calls game.create + spec.propose + spec.write ludwig catalog && cat specs/_index.md # review, then move drafts to status: active ``` **2. Spec → code.** Once a spec is active, generate the implementation, then verify. ``` ludwig init # one-time scaffolding ludwig new auth/login --game auth # OR write a spec by hand /spec-generate login # inside Claude Code: LLM writes src/ + tests/ludwig_login.rs ludwig verify login # structural + deterministic + judgment-pending /spec-verify login # inside Claude Code: also evaluates {judgment} invariants git add specs/ src/ tests/ .ludwig/state.json && git commit ``` When you change the spec, `ludwig diff` flags the implementing files. When you change the code directly, the same diff flags drift on the other side. The trailing `ludwig-spec: @ hash=` comment in each implementing file is load-bearing — don't hand-edit it. **Structural**— frontmatter is well-typed, sections are in order, behavior tags are unique, every file in`implements:` exists and carries a matching`ludwig-spec:` stamp. In-process Rust. Sub-second.**Deterministic**— Ludwig scaffolds`tests/ludwig_.rs` with one`#[test]` per Example and per`{deterministic}` invariant, each containing a`todo!()` body and a doc-comment with the Gherkin steps. You replace the bodies.`ludwig verify` shells out to`cargo test --test ludwig_` and parses the results.**Judgment**— each`{judgment}` invariant is packaged as a prompt and emitted as JSON. Ludwig itself holds no API keys; the host agent (Claude Code) evaluates each prompt and writes verdicts back via`ludwig verify --ingest-judgments ` . Verdicts are keyed by the spec hash, so changing the spec invalidates old verdicts automatically. `{property}` invariants are parsed but not yet machine-verified — no generator runs. Rather than silently pass, Ludwig reacts to the spec's status: on an `active` spec each property invariant reports `fail` (you can't rely on an invariant nothing checked), and on a draft/deprecated spec it reports `skip` . See `docs/specs/property-invariants-deferred.spec.md` . The `canonical:` setting in `ludwig.yml` decides which side is the source of truth when a spec and its code diverge: `spec` (default) — the spec leads. On drift, the code is stale;`ludwig diff` tells you to regenerate or bump the spec version.`code` — the code leads (spec-from-code). On drift, the spec is the stale side;`ludwig diff` tells you to update the spec to match the code, then re-verify. The skill guidance flips accordingly. An unknown value is rejected at load time. (Ludwig does not yet *derive* a spec from code — see Deferred.) ``` git clone && cd ludwig cargo install --path . ``` Or, from a release build, symlink the binary onto your PATH: ``` cargo build --release sudo ln -s "$PWD/target/release/ludwig" /usr/local/bin/ludwig ``` Requires a Rust toolchain (the Ludwig binary), and — in any project Ludwig verifies — whatever runs `cargo test` . | Command | Purpose | |---|---| `ludwig init` | Scaffold `ludwig.yml` , `specs/` , `.ludwig/` , register Claude Code skill | `ludwig new SLUG [--game G]` | Scaffold a new spec from a blank template | `ludwig move SLUG [--to-game G] [--force]` | Relocate an existing spec to a different game | `ludwig decompose` | Emit a prompt to break a project description (stdin or `-d` ) into specs | `ludwig propose SLUG -d DESC [-g GAME]` | Emit a prompt for drafting a single spec | `ludwig write-spec SLUG [-g GAME]` | Validate spec markdown on stdin and persist it | `ludwig game-new NAME [-i INTENT] [-x Term:Defn ...]` | Create a language-game manifest | `ludwig parse [PATH] [--quiet]` | Parse one or all specs; report structural errors | `ludwig plan ID` | Emit JSON generation brief for the host agent | `ludwig verify [ID] [--all] [--json]` | Run the full pipeline; write report under `.ludwig/reports/` | `ludwig verify [ID] --emit-judgment-prompts` | Print judgment prompts as JSON | `ludwig verify --ingest-judgments FILE` | Ingest judgment verdicts back into state.json | `ludwig diff [ID] [--all] [--json]` | Surface drift between specs and code | `ludwig catalog` | Regenerate `specs/_index.md` | `ludwig mcp [--root PATH]` | Start the MCP server over stdio | Ludwig speaks MCP (Model Context Protocol) over stdio. Register it with Claude Code: ``` claude mcp add ludwig -- ludwig mcp # or, scoped to one project: claude mcp add --scope project ludwig -- ludwig mcp ``` The server discovers the project at each tool call (`$PWD` , then `$LUDWIG_PROJECT` , then `--root` ). | Tool | Purpose | |---|---| `spec.list` | List all specs in the project | `spec.read` | Return the parsed structure of a spec | `spec.plan` | Generation brief (drives code generation) | `spec.verify` | Run the verification pipeline | `spec.diff` | Drift report between a spec and its implementing files | `spec.propose` | Emit a prompt for drafting a spec from a description | `spec.write` | Validate agent-drafted markdown and persist it | `spec.move` | Relocate a spec into a different game | `spec.ingest_judgments` | Persist judgment verdicts inline (no file path needed) | `project.decompose` | Emit a prompt to break a project into specs + games | `game.create` | Create a language-game (`_game.md` ) | Resources: `ludwig://spec/` — raw spec markdown`ludwig://report/latest` — most recent verification report Once `ludwig mcp` is registered, you can drive an entire project from a single chat session. The agent calls Ludwig's MCP tools instead of you running the CLI by hand; you stay in the loop for review. - **Scaffold the project.** In an empty directory, run`ludwig init` once. This creates`ludwig.yml` ,`specs/` ,`.ludwig/` , and the Claude Code skill manifest. The MCP server discovers this root on every tool call. - **Describe the app.** Tell the agent what you want to build, e.g.*"Build a URL shortener with per-tenant analytics."*The agent calls`project.decompose` to get a prompt, then proposes a set of games (`auth` ,`shorten` ,`analytics` , …) and a draft spec per behavior. - **Persist the drafts.** For each game the agent calls`game.create` ; for each spec it calls`spec.propose` to get the drafting prompt, then`spec.write` to validate and persist the markdown. Anything malformed is rejected with a structural error — the agent retries until the spec parses. - **Review and activate.** You read`specs/_index.md` (regenerated by`ludwig catalog` ), resolve any`Open questions` , and flip drafts to`status: active` . Active is the gate: nothing generates code until you approve the intent. - **Generate code.** The agent calls`spec.plan` per active spec to get the generation brief (resolved glossary, dependencies, behavior tags, examples) and writes the implementation and test bodies. Each generated source file ends with the`ludwig-spec: @ hash=…` stamp. - **Verify.** The agent calls`spec.verify` , which runs structural and deterministic layers and emits judgment prompts. The agent evaluates each judgment prompt itself, then calls`spec.verify` again with the verdicts ingested. The report lands under`.ludwig/reports/` and is also reachable as`ludwig://report/latest` . - **Iterate.** When you edit a spec, the agent re-runs`spec.plan` and regenerates the affected files; when you edit code, drift detection flags the seam. The spec is the durable artifact — chat turns are not. A minimal driver prompt: *"Use the Ludwig MCP tools. Decompose this description into specs, write them, wait for me to activate, then generate and verify."* The agent does the rest. ``` / ludwig.yml # project config specs/ _index.md # generated: spec catalog auth/ _game.md # language-game manifest login.spec.md src/ auth/login.rs # ends with `// ludwig-spec: login@1 hash=…` tests/ ludwig_login.rs # scaffolded once, then user-owned .ludwig/ state.json # spec hashes, file fingerprints, judgment verdicts reports/ # verification reports (JSON + latest.md) .claude/ skills/ ludwig.yaml # registered by `ludwig init` ``` `.spec.md` matches the frontmatter `id` . IDs are globally unique within a project. v0.1 ships: - Strict, hand-rolled markdown spec parser - Project scaffolding, catalog, language-game inheritance - Generation brief ( `ludwig plan` ) with transitively-resolved`depends_on` - Rust adapter: scaffolds `#[test]` stubs from Examples + invariants, runs via`cargo test` - Structural + deterministic verification, judgment-prompt round-trip - Three-way drift detection ( `ludwig diff` ) — stale stamp, body changed, missing, unstamped - Claude Code skill manifest + JSON-RPC 2.0 MCP server over stdio - 60-test integration suite Deferred: - Property-based **generation**—`{property}` invariants are parsed and their verification policy is defined and tested (active →`fail` , non-active →`skip` ; see "Verification, in three layers"), but no generator produces or runs property tests yet. **spec-from-code generation**—`canonical: code` mode flips drift semantics so the spec is the stale side (see "Canonical direction"), but Ludwig does not yet derive or auto-update a spec from existing code; you update the spec yourself, then re-verify. ``` cargo build --offline # the cached crate set is sufficient cargo test --offline cargo run --offline -- version ``` MIT.