Ludwig Spec Driven Development MCP

Ludwig Spec Driven Development MCP is a specification-driven framework that uses prose-first markdown specs to drive LLM code generation and verification for Rust projects. The framework ensures spec and implementation stay coupled by using the same document for both generation and verification, with deterministic checks via cargo test. Named after Ludwig Wittgenstein, it targets Rust exclusively and supports two workflows: decomposing descriptions into specs and generating code from active specs.

A specification-driven development framework whose specs are written as close to natural language as possible. The same prose-first markdown spec drives LLM code generation and verifies the resulting code. Named after Ludwig Wittgenstein. Scope:Ludwig targetsRust. Deterministic checks shell out to cargo test , so verifying a project requires a Cargo workspace. The spec format itself is plain markdown, but verification is Rust-specific by design — there is no plan to add adapters for other languages. Today, when a human asks an LLM to write code, the request is a chat message and the contract is whatever the model inferred. The result drifts the moment either side is edited. Ludwig replaces the chat message with a specification — a markdown document with a small fixed shape — that is the durable, versioned source of intent. The same document drives generation and verifies the resulting code, so spec and implementation stay coupled. --- id: token-bucket-rate-limiter title: Token-bucket rate limiter status: active implements: - src/rate limit/token bucket.rs version: 4 --- Intent Protect downstream services from bursty clients by allowing short bursts up to a configured capacity while enforcing a steady average rate. The limiter is a building block for per-tenant API quotas; it is not, by itself, a fairness mechanism between tenants. Behavior - { b1} A limiter is created with capacity and refill rate tokens/sec . - { b2} try acquire n returns true and consumes n tokens iff at least n are available. - { b3} Tokens refill continuously based on wall-clock time, capped at capacity. Examples ​ example name="burst then throttle" Given a limiter with capacity 5 and refill rate 1/sec When try acquire 1 is called 5 times in immediate succession Then all 5 calls return true And the 6th call in the same instant returns false ​ Invariants - {deterministic} tokens consumed <= capacity + refill rate elapsed seconds. - {property} after waiting C / refill rate seconds the limiter is full. - {judgment} Errors surfaced to callers name the limiter and required wait in plain English. Sections are fixed and ordered: Intent → Behavior → Examples → Invariants plus optional Non-goals , Open questions , Implementation notes . The only embedded DSL is Given/When/Then inside Examples — Gherkin-shaped so LLMs already know it. Ludwig supports two complementary directions: 1. Description → specs. Start from a project or feature description; the host agent decomposes it into specs and drafts each one; Ludwig validates and persists; the human reviews before any implementation. echo "A URL shortener with per-tenant analytics" | ludwig decompose inside Claude Code: /project-decompose A URL shortener with per-tenant analytics agent emits JSON, then for each game/spec calls game.create + spec.propose + spec.write ludwig catalog && cat specs/ index.md review, then move drafts to status: active 2. Spec → code. Once a spec is active, generate the implementation, then verify. ludwig init one-time scaffolding ludwig new auth/login --game auth OR write a spec by hand /spec-generate login inside Claude Code: LLM writes src/ + tests/ludwig login.rs ludwig verify login structural + deterministic + judgment-pending /spec-verify login inside Claude Code: also evaluates {judgment} invariants git add specs/ src/ tests/ .ludwig/state.json && git commit When you change the spec, ludwig diff flags the implementing files. When you change the code directly, the same diff flags drift on the other side. The trailing ludwig-spec: <id @<version hash=<sha comment in each implementing file is load-bearing — don't hand-edit it. Structural — frontmatter is well-typed, sections are in order, behavior tags are unique, every file in implements: exists and carries a matching ludwig-spec: stamp. In-process Rust. Sub-second. Deterministic — Ludwig scaffolds tests/ludwig <slug .rs with one test per Example and per {deterministic} invariant, each containing a todo body and a doc-comment with the Gherkin steps. You replace the bodies. ludwig verify shells out to cargo test --test ludwig <slug and parses the results. Judgment — each {judgment} invariant is packaged as a prompt and emitted as JSON. Ludwig itself holds no API keys; the host agent Claude Code evaluates each prompt and writes verdicts back via ludwig verify --ingest-judgments <file . Verdicts are keyed by the spec hash, so changing the spec invalidates old verdicts automatically. {property} invariants are parsed but not yet machine-verified — no generator runs. Rather than silently pass, Ludwig reacts to the spec's status: on an active spec each property invariant reports fail you can't rely on an invariant nothing checked , and on a draft/deprecated spec it reports skip . See docs/specs/property-invariants-deferred.spec.md . The canonical: setting in ludwig.yml decides which side is the source of truth when a spec and its code diverge: spec default — the spec leads. On drift, the code is stale; ludwig diff tells you to regenerate or bump the spec version. code — the code leads spec-from-code . On drift, the spec is the stale side; ludwig diff tells you to update the spec to match the code, then re-verify. The skill guidance flips accordingly. An unknown value is rejected at load time. Ludwig does not yet derive a spec from code — see Deferred. git clone <this repo && cd ludwig cargo install --path . Or, from a release build, symlink the binary onto your PATH: cargo build --release sudo ln -s "$PWD/target/release/ludwig" /usr/local/bin/ludwig Requires a Rust toolchain the Ludwig binary , and — in any project Ludwig verifies — whatever runs cargo test . | Command | Purpose | |---|---| ludwig init | Scaffold ludwig.yml , specs/ , .ludwig/ , register Claude Code skill | ludwig new SLUG --game G | Scaffold a new spec from a blank template | ludwig move SLUG --to-game G --force | Relocate an existing spec to a different game | ludwig decompose | Emit a prompt to break a project description stdin or -d into specs | ludwig propose SLUG -d DESC -g GAME | Emit a prompt for drafting a single spec | ludwig write-spec SLUG -g GAME | Validate spec markdown on stdin and persist it | ludwig game-new NAME -i INTENT -x Term:Defn ... | Create a language-game manifest | ludwig parse PATH --quiet | Parse one or all specs; report structural errors | ludwig plan ID | Emit JSON generation brief for the host agent | ludwig verify ID --all --json | Run the full pipeline; write report under .ludwig/reports/ | ludwig verify ID --emit-judgment-prompts | Print judgment prompts as JSON | ludwig verify --ingest-judgments FILE | Ingest judgment verdicts back into state.json | ludwig diff ID --all --json | Surface drift between specs and code | ludwig catalog | Regenerate specs/ index.md | ludwig mcp --root PATH | Start the MCP server over stdio | Ludwig speaks MCP Model Context Protocol over stdio. Register it with Claude Code: claude mcp add ludwig -- ludwig mcp or, scoped to one project: claude mcp add --scope project ludwig -- ludwig mcp The server discovers the project at each tool call $PWD , then $LUDWIG PROJECT , then --root . | Tool | Purpose | |---|---| spec.list | List all specs in the project | spec.read | Return the parsed structure of a spec | spec.plan | Generation brief drives code generation | spec.verify | Run the verification pipeline | spec.diff | Drift report between a spec and its implementing files | spec.propose | Emit a prompt for drafting a spec from a description | spec.write | Validate agent-drafted markdown and persist it | spec.move | Relocate a spec into a different game | spec.ingest judgments | Persist judgment verdicts inline no file path needed | project.decompose | Emit a prompt to break a project into specs + games | game.create | Create a language-game game.md | Resources: ludwig://spec/<id — raw spec markdown ludwig://report/latest — most recent verification report Once ludwig mcp is registered, you can drive an entire project from a single chat session. The agent calls Ludwig's MCP tools instead of you running the CLI by hand; you stay in the loop for review. - Scaffold the project. In an empty directory, run ludwig init once. This creates ludwig.yml , specs/ , .ludwig/ , and the Claude Code skill manifest. The MCP server discovers this root on every tool call. - Describe the app. Tell the agent what you want to build, e.g. "Build a URL shortener with per-tenant analytics." The agent calls project.decompose to get a prompt, then proposes a set of games auth , shorten , analytics , … and a draft spec per behavior. - Persist the drafts. For each game the agent calls game.create ; for each spec it calls spec.propose to get the drafting prompt, then spec.write to validate and persist the markdown. Anything malformed is rejected with a structural error — the agent retries until the spec parses. - Review and activate. You read specs/ index.md regenerated by ludwig catalog , resolve any Open questions , and flip drafts to status: active . Active is the gate: nothing generates code until you approve the intent. - Generate code. The agent calls spec.plan per active spec to get the generation brief resolved glossary, dependencies, behavior tags, examples and writes the implementation and test bodies. Each generated source file ends with the ludwig-spec: <id @<version hash=… stamp. - Verify. The agent calls spec.verify , which runs structural and deterministic layers and emits judgment prompts. The agent evaluates each judgment prompt itself, then calls spec.verify again with the verdicts ingested. The report lands under .ludwig/reports/ and is also reachable as ludwig://report/latest . - Iterate. When you edit a spec, the agent re-runs spec.plan and regenerates the affected files; when you edit code, drift detection flags the seam. The spec is the durable artifact — chat turns are not. A minimal driver prompt: "Use the Ludwig MCP tools. Decompose this description into specs, write them, wait for me to activate, then generate and verify." The agent does the rest. <project root / ludwig.yml project config specs/ index.md generated: spec catalog auth/ game.md language-game manifest login.spec.md src/ auth/login.rs ends with // ludwig-spec: login@1 hash=… tests/ ludwig login.rs scaffolded once, then user-owned .ludwig/ state.json spec hashes, file fingerprints, judgment verdicts reports/ verification reports JSON + latest.md .claude/ skills/ ludwig.yaml registered by ludwig init <slug .spec.md matches the frontmatter id . IDs are globally unique within a project. v0.1 ships: - Strict, hand-rolled markdown spec parser - Project scaffolding, catalog, language-game inheritance - Generation brief ludwig plan with transitively-resolved depends on - Rust adapter: scaffolds test stubs from Examples + invariants, runs via cargo test - Structural + deterministic verification, judgment-prompt round-trip - Three-way drift detection ludwig diff — stale stamp, body changed, missing, unstamped - Claude Code skill manifest + JSON-RPC 2.0 MCP server over stdio - 60-test integration suite Deferred: - Property-based generation — {property} invariants are parsed and their verification policy is defined and tested active → fail , non-active → skip ; see "Verification, in three layers" , but no generator produces or runs property tests yet. spec-from-code generation — canonical: code mode flips drift semantics so the spec is the stale side see "Canonical direction" , but Ludwig does not yet derive or auto-update a spec from existing code; you update the spec yourself, then re-verify. cargo build --offline the cached crate set is sufficient cargo test --offline cargo run --offline -- version MIT.