A Prompt Is a Wish. A Tool Is a Law. A developer built a platform that lets non-engineers ship AI tools to production by describing workflows in plain English. The platform uses a fixed pipeline of tool calls that form a graph, where each step validates the previous one before returning instructions for the next, preventing the AI from skipping steps. The key engineering challenge was making the rules a property of the tools rather than relying on prompts, ensuring safety for users who cannot read the generated code. How I let non-engineers ship AI tools to production — and the boring infrastructure that made it safe. A product manager described a workflow in plain English — "every morning, pull yesterday's failed payments, group them by error code, and post a summary to our channel." Twenty minutes later it was running in production. She never opened an editor. She never saw a line of TypeScript. She talked to an agent, the agent wrote the code, and — once a human had reviewed the pull request — it shipped. That sentence should make you nervous. It made me nervous, and I'm the one who built the thing. The demo is "look, it wrote the code." The operation is "a marketer's tool now has a path to the payments database and nobody reviewed it." The interesting engineering isn't the part where an LLM writes code — that's the easy, demo-able part. It's the guardrails that decide whether the code it writes is allowed to exist. Here's the platform, and the five problems I had to solve to make it safe to hand to people who can't read the code that runs. The platform is a place where anyone — engineers, PMs, designers, QA — can publish a reusable AI tool, and everyone else can use it. Write once, available to all. A few terms up front, because the whole design leans on them: Under the hood it's three small Workers speaking MCP: a gateway auth, routing, secrets , a skill-runner , and an agent-runner . Secrets are fetched by the gateway from a secrets manager — never inlined, never handed to the code that runs user logic unless that code is explicitly an action more on that distinction below . Here's the part most "AI platform" posts skip: how it's consumed. You don't install fifty separate agents into your Claude client. You connect one MCP server. Every published tool shows up through that single endpoint. That choice is the difference between a platform and a context-bloat machine, and I'll come back to why. The tools themselves reach the systems a company runs on — issue trackers, chat, docs, the CMS, the analytics warehouse, the payments database. Some of that data is harmless. Some of it is a compliance incident waiting for one careless fetch . The whole design is organized around that asymmetry. The authoring flow is a fixed pipeline: plan it, get the plan approved, generate the files, review your own work, open a PR. A nice orderly flow. The agent refused to respect it. It generated files before the plan was approved. It "reviewed" code by saying looks good and immediately opened a PR. It skipped the inconvenient steps and barreled toward the finish, because that's what a model optimizing for be helpful, complete the task does. My pipeline existed in my head and in a long instruction file the model treated as a polite suggestion. I tried the obvious things first, in order of increasing desperation: The pattern across all three: each lives inside the model's reasoning , and anything inside the model's reasoning is negotiable. A model under task pressure rationalizes its way past text reliably enough that you can't depend on it. Prompts still steer the model — they just can't guarantee it, and a production rule needs a guarantee. So the trick isn't to tell the model the rules better. It's to make the rules a property of the tools. Each step becomes its own tool, and the tools form a graph: a step tool validates that the previous step happened, and only on success does it return the instructions for the next step . The model can't skip ahead, because it physically doesn't have the next instructions until the current gate hands them over — and the gate is the only edge into the next state. start building → confirm plan → submit for review → submit final → create pull request This is the part people get wrong, including me at first: the thing that makes a gate a wall is not that a failed tool call is hard to ignore. The model can ignore an error — it can retry, or route around it, the same way it routed around hooks. What it cannot do is fabricate the next step's instructions , because those only exist inside a validated success response. The determinism is in the server-side state gate — every tool checks the persisted phase before it acts — not in the error. The error is just how the gate says not yet . Concretely: the agent calls create pull request while the phase is still planning . The gate sees the wrong phase, returns an error, and — the part that matters — never hands back the next step's instructions. The agent isn't forbidden from finishing; it's unable to, because finishing requires words it was never given. State lives server-side, keyed by session, in Durable Object storage — persisted outside the model's context entirely, so the compaction that killed the in-memory version can't touch it. js const fail = text: string = { isError: true, content: { type: "text", text } } ; const ok = text: string = { content: { type: "text", text } } ; export const confirmPlan: ToolDef = { name: "confirm plan", description: "Submit your implementation plan. Required before writing any code.", inputSchema: planSchema, run: async { plan }, ctx = { const state = await ctx.storage.get