Ludwig Spec Driven Development MCP

wpnews.pro

A specification-driven development framework whose specs are written as close to natural language as possible. The same prose-first markdown spec drives LLM code generation and verifies the resulting code. Named after Ludwig Wittgenstein.

Scope:Ludwig targetsRust. Deterministic checks shell out tocargo test

, so verifying a project requires a Cargo workspace. The spec format itself is plain markdown, but verification is Rust-specific by design — there is no plan to add adapters for other languages.

Today, when a human asks an LLM to write code, the request is a chat message and the contract is whatever the model inferred. The result drifts the moment either side is edited. Ludwig replaces the chat message with a specification — a markdown document with a small fixed shape — that is the durable, versioned source of intent. The same document drives generation and verifies the resulting code, so spec and implementation stay coupled.

---
id: token-bucket-rate-limiter
title: Token-bucket rate limiter
status: active
implements:
  - src/rate_limit/token_bucket.rs
version: 4
---

## Intent
Protect downstream services from bursty clients by allowing short bursts up
to a configured capacity while enforcing a steady average rate. The limiter
is a building block for per-tenant API quotas; it is not, by itself, a
fairness mechanism between tenants.

## Behavior
- {#b1} A limiter is created with `capacity` and `refill_rate` (tokens/sec).
- {#b2} `try_acquire(n)` returns true and consumes n tokens iff at least n are available.
- {#b3} Tokens refill continuously based on wall-clock time, capped at capacity.

## Examples
``` example name="burst then throttle"
Given a limiter with capacity 5 and refill_rate 1/sec
When try_acquire(1) is called 5 times in immediate succession
Then all 5 calls return true
And the 6th call in the same instant returns false
```

## Invariants
- {deterministic} tokens_consumed <= capacity + refill_rate * elapsed_seconds.
- {property} after waiting C / refill_rate seconds the limiter is full.
- {judgment} Errors surfaced to callers name the limiter and required wait in plain English.

Sections are fixed and ordered: Intent → Behavior → Examples → Invariants (plus optional Non-goals, Open questions, Implementation notes). The only embedded DSL is Given/When/Then

inside Examples — Gherkin-shaped so LLMs already know it.

Ludwig supports two complementary directions:

1. Description → specs. Start from a project or feature description; the host agent decomposes it into specs and drafts each one; Ludwig validates and persists; the human reviews before any implementation.

echo "A URL shortener with per-tenant analytics" | ludwig decompose
/project-decompose A URL shortener with per-tenant analytics
ludwig catalog && cat specs/_index.md   # review, then move drafts to status: active

2. Spec → code. Once a spec is active, generate the implementation, then verify.

ludwig init                              # one-time scaffolding
ludwig new auth/login --game auth        # OR write a spec by hand
/spec-generate login                     # inside Claude Code: LLM writes src/ + tests/ludwig_login.rs
ludwig verify login                      # structural + deterministic + judgment-pending
/spec-verify login                       # inside Claude Code: also evaluates {judgment} invariants
git add specs/ src/ tests/ .ludwig/state.json && git commit

When you change the spec, ludwig diff

flags the implementing files. When you change the code directly, the same diff flags drift on the other side. The trailing ludwig-spec: <id>@<version> hash=<sha>

comment in each implementing file is load-bearing — don't hand-edit it.

Structural— frontmatter is well-typed, sections are in order, behavior tags are unique, every file inimplements:

exists and carries a matchingludwig-spec:

stamp. In-process Rust. Sub-second.Deterministic— Ludwig scaffoldstests/ludwig_<slug>.rs

with one#[test]

per Example and per{deterministic}

invariant, each containing atodo!()

body and a doc-comment with the Gherkin steps. You replace the bodies.ludwig verify

shells out tocargo test --test ludwig_<slug>

and parses the results.Judgment— each{judgment}

invariant is packaged as a prompt and emitted as JSON. Ludwig itself holds no API keys; the host agent (Claude Code) evaluates each prompt and writes verdicts back vialudwig verify --ingest-judgments <file>

. Verdicts are keyed by the spec hash, so changing the spec invalidates old verdicts automatically.

{property}

invariants are parsed but not yet machine-verified — no generator runs. Rather than silently pass, Ludwig reacts to the spec's status: on an active

spec each property invariant reports fail

(you can't rely on an invariant nothing checked), and on a draft/deprecated spec it reports skip

. See docs/specs/property-invariants-deferred.spec.md

.

The canonical:

setting in ludwig.yml

decides which side is the source of truth when a spec and its code diverge:

spec

(default) — the spec leads. On drift, the code is stale;ludwig diff

tells you to regenerate or bump the spec version.code

— the code leads (spec-from-code). On drift, the spec is the stale side;ludwig diff

tells you to update the spec to match the code, then re-verify. The skill guidance flips accordingly.

An unknown value is rejected at load time. (Ludwig does not yet derive a spec from code — see Deferred.)

git clone <this repo> && cd ludwig
cargo install --path .

Or, from a release build, symlink the binary onto your PATH:

cargo build --release
sudo ln -s "$PWD/target/release/ludwig" /usr/local/bin/ludwig

Requires a Rust toolchain (the Ludwig binary), and — in any project Ludwig verifies — whatever runs cargo test

.

Command	Purpose
`ludwig init`
Scaffold `ludwig.yml` , `specs/` , `.ludwig/` , register Claude Code skill
`ludwig new SLUG [--game G]`
Scaffold a new spec from a blank template
`ludwig move SLUG [--to-game G] [--force]`
Relocate an existing spec to a different game
`ludwig decompose`
Emit a prompt to break a project description (stdin or `-d` ) into specs
`ludwig propose SLUG -d DESC [-g GAME]`
Emit a prompt for drafting a single spec
`ludwig write-spec SLUG [-g GAME]`
Validate spec markdown on stdin and persist it
`ludwig game-new NAME [-i INTENT] [-x Term:Defn ...]`
Create a language-game manifest
`ludwig parse [PATH] [--quiet]`
Parse one or all specs; report structural errors
`ludwig plan ID`
Emit JSON generation brief for the host agent
`ludwig verify [ID] [--all] [--json]`
Run the full pipeline; write report under `.ludwig/reports/`
`ludwig verify [ID] --emit-judgment-prompts`
Print judgment prompts as JSON
`ludwig verify --ingest-judgments FILE`
Ingest judgment verdicts back into state.json
`ludwig diff [ID] [--all] [--json]`
Surface drift between specs and code
`ludwig catalog`
Regenerate `specs/_index.md`
`ludwig mcp [--root PATH]`
Start the MCP server over stdio

Ludwig speaks MCP (Model Context Protocol) over stdio. Register it with Claude Code:

claude mcp add ludwig -- ludwig mcp
claude mcp add --scope project ludwig -- ludwig mcp

The server discovers the project at each tool call ($PWD

, then $LUDWIG_PROJECT

, then --root

).

Tool	Purpose
`spec.list`
List all specs in the project
`spec.read`
Return the parsed structure of a spec
`spec.plan`
Generation brief (drives code generation)
`spec.verify`
Run the verification pipeline
`spec.diff`
Drift report between a spec and its implementing files
`spec.propose`
Emit a prompt for drafting a spec from a description
`spec.write`
Validate agent-drafted markdown and persist it
`spec.move`
Relocate a spec into a different game
`spec.ingest_judgments`
Persist judgment verdicts inline (no file path needed)
`project.decompose`
Emit a prompt to break a project into specs + games
`game.create`
Create a language-game (`_game.md` )

Resources:

ludwig://spec/<id>

— raw spec markdownludwig://report/latest

— most recent verification report

Once ludwig mcp

is registered, you can drive an entire project from a single chat session. The agent calls Ludwig's MCP tools instead of you running the CLI by hand; you stay in the loop for review.

Scaffold the project. In an empty directory, runludwig init

once. This createsludwig.yml

,specs/

,.ludwig/

, and the Claude Code skill manifest. The MCP server discovers this root on every tool call. - Describe the app. Tell the agent what you want to build, e.g.*"Build a URL shortener with per-tenant analytics."*The agent callsproject.decompose

to get a prompt, then proposes a set of games (auth

,shorten

,analytics

, …) and a draft spec per behavior. - Persist the drafts. For each game the agent callsgame.create

; for each spec it callsspec.propose

to get the drafting prompt, thenspec.write

to validate and persist the markdown. Anything malformed is rejected with a structural error — the agent retries until the spec parses. - Review and activate. You readspecs/_index.md

(regenerated byludwig catalog

), resolve anyOpen questions

, and flip drafts tostatus: active

. Active is the gate: nothing generates code until you approve the intent. - Generate code. The agent callsspec.plan

per active spec to get the generation brief (resolved glossary, dependencies, behavior tags, examples) and writes the implementation and test bodies. Each generated source file ends with theludwig-spec: <id>@<version> hash=…

stamp. - Verify. The agent callsspec.verify

, which runs structural and deterministic layers and emits judgment prompts. The agent evaluates each judgment prompt itself, then callsspec.verify

again with the verdicts ingested. The report lands under.ludwig/reports/

and is also reachable asludwig://report/latest

. - Iterate. When you edit a spec, the agent re-runsspec.plan

and regenerates the affected files; when you edit code, drift detection flags the seam. The spec is the durable artifact — chat turns are not.

A minimal driver prompt: "Use the Ludwig MCP tools. Decompose this description into specs, write them, wait for me to activate, then generate and verify." The agent does the rest.

<project root>/
  ludwig.yml                       # project config
  specs/
    _index.md                      # generated: spec catalog
    auth/
      _game.md                     # language-game manifest
      login.spec.md
  src/
    auth/login.rs                  # ends with `// ludwig-spec: login@1 hash=…`
  tests/
    ludwig_login.rs                # scaffolded once, then user-owned
  .ludwig/
    state.json                     # spec hashes, file fingerprints, judgment verdicts
    reports/                       # verification reports (JSON + latest.md)
  .claude/
    skills/
      ludwig.yaml                  # registered by `ludwig init`

<slug>.spec.md

matches the frontmatter id

. IDs are globally unique within a project.

v0.1 ships:

Strict, hand-rolled markdown spec parser
Project scaffolding, catalog, language-game inheritance
Generation brief ( ludwig plan

) with transitively-resolveddepends_on

Rust adapter: scaffolds #[test]

stubs from Examples + invariants, runs viacargo test

Structural + deterministic verification, judgment-prompt round-trip
Three-way drift detection ( ludwig diff

) — stale stamp, body changed, missing, unstamped - Claude Code skill manifest + JSON-RPC 2.0 MCP server over stdio

60-test integration suite

Deferred:

Property-based generation—{property}

invariants are parsed and their verification policy is defined and tested (active →fail

, non-active →skip

; see "Verification, in three layers"), but no generator produces or runs property tests yet. spec-from-code generation—canonical: code

mode flips drift semantics so the spec is the stale side (see "Canonical direction"), but Ludwig does not yet derive or auto-update a spec from existing code; you update the spec yourself, then re-verify.

cargo build --offline    # the cached crate set is sufficient
cargo test --offline
cargo run --offline -- version

MIT.

source & further reading

github.com — original article

Ludwig Spec Driven Development MCP

Run your AI side-project on zahid.host