llm-nano-vm v0.8.0 — deterministic FSM runtime for LLM pipelines, now with output validation and per-step timeouts

The article introduces **llm-nano-vm v0.8.0**, a deterministic Finite State Machine (FSM) runtime for LLM pipelines that flips the typical architecture by making the runtime the orchestrator rather than the LLM. Key new features include `allowed_outputs` for strict output validation against an explicit enum list and per-step `timeout_seconds` with configurable fallback policies, ensuring the model cannot skip steps or escape guardrails. The system has been benchmarked at over 1 million operations with zero violations and demonstrates crash consistency with 100% hash match on replay.

PyPI: pip install llm-nano-vm GitHub: http://github.com/Ale007XD/nano vm MCP gateway: http://github.com/Ale007XD/nano-vm-mcp I've been building a deterministic FSM execution kernel for LLM workflows. v0.8.0 just shipped to PyPI. Here's what it is, what's new, and where it's going. Most LLM frameworks treat the model as the orchestrator. nano-vm flips that: the runtime is the orchestrator, the model is just one step in a deterministic graph. δ S, E → S' Current state + validated event = next state. The model cannot skip steps, reorder them, or escape guardrails. The FSM is the source of truth. Four step types: llm , tool , condition , parallel . Programs are plain Python dicts. No DSL parser, no heavy framework magic, and zero dependency overhead. program = Program.from dict { "name": "customer refund", "steps": { "id": "analyze", "type": "llm", "prompt": "Valid refund? Reply 'yes' or 'no'.\nRequest: $user input", "output key": "decision", "allowed outputs": "yes", "no" , ← v0.8.0 }, { "id": "guardrail", "type": "condition", "condition": "'yes' in '$decision'", "then": "process refund", "otherwise": "reject", }, {"id": "process refund", "type": "tool", "tool": "issue refund", "is terminal": True}, {"id": "reject", "type": "tool", "tool": "send rejection", "is terminal": True}, , } The guardrail step cannot be bypassed regardless of what the model returns. What's new in v0.8.0 allowed outputs — LLM enum guard Validates the model's raw output against an explicit list before the value touches anything downstream. { "id": "classify", "type": "llm", "prompt": "Classify. Reply ONLY with: refund / query / other", "allowed outputs": "refund", "query", "other" , "on error": "skip", → falls back to "refund" first element on mismatch } Three policies on mismatch: fail default, trace → FAILED , skip substitute allowed outputs , retry retry up to max retries, then FAILED . timeout seconds + on timeout — per-step LLM timeout Prevents a hung API call from stalling the entire FSM. { "id": "analyze", "type": "llm", "timeout seconds": 5.0, "on timeout": "fallback", → falls back to allowed outputs 0 or '' } Two policies: fail default and fallback. Both features are independent and composable — you can use either or both on any llm step. What it can do right now Benchmark — v0.8.0 WSL2 · Python 3.12 · MockAdapter · 3×5×10k 10/10 PASS · 1,096,500 ops · 0 violations ScenarioMean TPSp95 Refund pipeline 2,200/s 123 ms Double-execution guard 2,800/s 69 ms Budget enforcement 2,400/s 97 ms Parallel throughput 1,000/s 196 ms MCP store round-trip 11,000/s 0.13 ms GovernanceEnvelope 2,100/s 108 ms Crash consistency 11/s 115 ms Replay equivalence 1,300/s 164 ms Adversarial retries 2,600/s 87 ms Long-horizon 1k steps 95/s 11,887 ms BM-INT-07 Crash consistency : crash rate=100% hash match=100% — replay after simulated crash produces identical trace hash every time. BM-INT-10 Memory footprint : peak RSS 76.5 MB, alloc 3.62 MB for 1,000-step programs — no memory leaks detected. Validated on real payment APIs What's coming next Phase 0 Immediate : ProgramValidator — static analysis at Program build time. Catches missing then/otherwise/next step targets, unreachable steps, and cycle detection. Currently these fail at runtime; when dealing with LLM-generated workflows, static analysis is a must. Phase 1 Gateway Correctness : StateContext persistence between MCP calls in SQLite WAL. Right now, if the gateway process restarts after /create but before polling completes, you get a new requestId — which is a real financial duplicate risk. Closing this with an execution contexts table + upsert on every step. Up next: TRACE projection to SQLite, GovernedToolExecutor policy-level tool capability enforcement , idempotency store, and native vm.step MCP wiring. Phase 2 Dev Agent : nano-vm-dev-agent — the FSM runtime managing its own development stack read repo files → generate patch llm → run mypy → run pytest → write repo files . DA-1 milestone is done 12/12 tests . DA-2 will be the first live run against a real sprint task StateContext persistence . Still working on search code and reproduce bug tool-functions before launching live. Phase 3 Observability : OpenTelemetry span per FSM step + incremental counters in Trace llm calls, tool calls, retries total . Install pip install llm-nano-vm==0.8.0 pip install llm-nano-vm litellm ==0.8.0 LiteLLM provider support pip install nano-vm-mcp MCP gateway LLMs are completely optional. The runtime works perfectly fine as a pure, lightweight deterministic workflow engine. Questions / feedback welcome