{"slug": "stop-letting-the-prompt-be-your-state-machine", "title": "Stop letting the prompt be your state machine", "summary": "A developer argues that using prompts as runtime state machines leads to unpredictable LLM behavior and failed agentic AI projects. The solution is to break workflows into discrete, typed steps with deterministic guardrails, isolating model calls and validating outputs with schemas like Zod.", "body_md": "You shipped an LLM feature six months ago. Now the same user input produces wildly different outputs depending on... nothing you can point to. Something in the sampling? The time the context filled up and a chunk got dropped? Nobody knows. This is what happens when the prompt becomes your runtime.\n\nHere is what the trap looks like in TypeScript:\n\n``` js\nasync function handleUserRequest(input: string): Promise<string> {\n  const prompt = `\n    You are a helpful assistant.\n    The user said: ${input}\n    Previous context: ${someGlobalContext}\n\n    Decide what to do, gather any information you need,\n    format the response, and return it.\n  `;\n  return llm.complete(prompt);\n}\n```\n\nThe model is doing everything here: deciding the intent, gathering data, formatting output, choosing what to persist. That is a footgun. You handed the runtime to a stochastic function.\n\nGartner attributes many failed agentic AI projects to unclear value and inadequate risk controls. Deterministic, testable workflows address both. The fix is not a better prompt. The fix is to stop using the prompt as an architecture.\n\nBe honest about what you can and cannot control.\n\n**You cannot control:** the model's exact output. It is probabilistic by design.\n\n**You can control:**\n\nDeterminism here means: the same inputs, the same workflow steps, the same guardrails every time. Not the same tokens every time. That is a realistic and achievable target. It is also the thing teams skip when they are moving fast.\n\nBreak the work into discrete typed steps. Each step has a clear input type and a clear output type. The model call is one step in the pipeline, not the whole thing.\n\n```\ntype WorkflowInput = {\n  userId: string;\n  rawRequest: string;\n};\n\ntype EnrichedInput = WorkflowInput & {\n  userContext: UserContext;\n  relevantDocs: string[];\n};\n\ntype ModelOutput = {\n  intent: \"summarize\" | \"search\" | \"draft\" | \"unknown\";\n  confidence: number;\n  payload: string;\n};\n\ntype WorkflowResult = {\n  response: string;\n  audit: {\n    intent: string;\n    humanReviewed: boolean;\n  };\n};\n\nasync function enrich(input: WorkflowInput): Promise<EnrichedInput> {\n  const [userContext, relevantDocs] = await Promise.all([\n    fetchUserContext(input.userId),\n    fetchRelevantDocs(input.rawRequest),\n  ]);\n  return { ...input, userContext, relevantDocs };\n}\n\nasync function classify(enriched: EnrichedInput): Promise<ModelOutput> {\n  // Model call is isolated here, not scattered everywhere\n  const raw = await llm.complete(buildClassificationPrompt(enriched));\n  return parseAndValidate(raw);\n}\n\nasync function respond(output: ModelOutput): Promise<WorkflowResult> {\n  const response = await generateResponse(output);\n  return {\n    response,\n    audit: { intent: output.intent, humanReviewed: false },\n  };\n}\n\nasync function runWorkflow(input: WorkflowInput): Promise<WorkflowResult> {\n  const enriched = await enrich(input);\n  const classified = await classify(enriched);\n  return respond(classified);\n}\n```\n\nEach step is independently unit testable. You can mock `classify`\n\nto return a fixed `ModelOutput`\n\nand test `respond`\n\nin complete isolation. That was impossible when the prompt was the runtime.\n\nThe model call step should never return a raw string when you need structured data. Use JSON mode, tool calling, or a schema constrained completion, then validate immediately.\n\n``` js\nimport { z } from \"zod\";\n\nconst ModelOutputSchema = z.object({\n  intent: z.enum([\"summarize\", \"search\", \"draft\", \"unknown\"]),\n  confidence: z.number().min(0).max(1),\n  payload: z.string().min(1),\n});\n\nasync function classify(enriched: EnrichedInput): Promise<ModelOutput> {\n  const raw = await llm.complete(buildClassificationPrompt(enriched), {\n    response_format: { type: \"json_object\" },\n  });\n\n  const parsed = JSON.parse(raw);\n  const result = ModelOutputSchema.safeParse(parsed);\n\n  if (!result.success) {\n    throw new ClassificationValidationError(result.error, raw);\n  }\n\n  return result.data;\n}\n```\n\nZod gives you a contract. If the model drifts, the validation catches it before the rest of your app sees the output. The answer to \"how do you validate LLM responses?\" is: schema validation on parse, not on trust.\n\nValidation failures should not crash silently. Wrap the model call with a retry budget and a typed failure signal:\n\n```\ntype ClassifyResult =\n  | { ok: true; data: ModelOutput }\n  | { ok: false; reason: \"validation\" | \"timeout\" | \"rate_limit\"; raw?: string };\n\nasync function classifySafe(\n  enriched: EnrichedInput,\n  maxAttempts = 2\n): Promise<ClassifyResult> {\n  for (let attempt = 1; attempt <= maxAttempts; attempt++) {\n    try {\n      const data = await classify(enriched);\n      return { ok: true, data };\n    } catch (err) {\n      if (err instanceof ClassificationValidationError && attempt < maxAttempts) {\n        continue; // one retry on schema failure\n      }\n      if (err instanceof RateLimitError) {\n        return { ok: false, reason: \"rate_limit\" };\n      }\n      return { ok: false, reason: \"validation\", raw: (err as any).raw };\n    }\n  }\n  return { ok: false, reason: \"validation\" };\n}\n```\n\nIdempotency matters when retries touch external state. If your workflow calls an API inside the model step, wrap it in an idempotency key so a retry does not double the side effect. The workflow layer controls this. The model itself cannot.\n\nA hybrid memory and retrieval approach (automatic retrieval at request start plus explicit storage) keeps agent state predictable. So does knowing when not to automate the final step.\n\nHigh impact or irreversible steps should route to a human via a control gate before committing. Not because LLMs are bad. Because some decisions carry real consequences and the cost of a wrong one outweighs the automation gain.\n\n```\nasync function runWorkflow(input: WorkflowInput): Promise<WorkflowResult> {\n  const enriched = await enrich(input);\n  const classifyResult = await classifySafe(enriched);\n\n  if (!classifyResult.ok) {\n    return queueForHumanReview(enriched, classifyResult.reason);\n  }\n\n  const { data: classified } = classifyResult;\n\n  // Irreversible or low-confidence intent routes to human review\n  if (classified.intent === \"draft\" && classified.confidence < 0.85) {\n    return queueForHumanReview(enriched, \"low confidence on draft intent\");\n  }\n\n  return respond(classified);\n}\n```\n\nThe control gate is a typed branch in your workflow, not a prompt instruction. \"Only do this if you are sure\" is not a guardrail. A typed branch is.\n\nIf you want to go deeper on how this fits into a full system, I wrote up the [production architecture for agents](https://mudassirkhan.me/blog/agentic-ai-production-architecture) including how to wire these patterns together at scale.\n\n**How do you make LLM output deterministic?**\n\nYou cannot make the model itself deterministic. You make the system deterministic around it. Schema validated structured output, typed workflow steps, and retry gates with failure signals are the practical levers. The model is one isolated black box step in an otherwise typed, testable pipeline.\n\n**What is structured output?**\n\nStructured output means the model returns data in a schema you define rather than freeform prose. Most providers support JSON mode or function calling. You parse and validate the result immediately with a schema library. If it does not match the schema, treat it as a failed call, not a soft warning.\n\n**How do you validate LLM responses?**\n\nParse the response as JSON, then run it through a schema validator. Zod is a common choice in TypeScript projects. A `safeParse`\n\ncall gives you a typed result: success with data or failure with an error you can act on. Failure is an exception to handle, not a case to log and move on.\n\n*If you want a deeper look at how deterministic workflows fit into a full production system, I cover the complete production architecture for agents on my site.*\n\n*If you want Next.js for AI products wired up end to end, that is exactly the kind of work I take on.*\n\n*Drop a comment below. Curious what patterns people use to keep LLM features testable in production.*", "url": "https://wpnews.pro/news/stop-letting-the-prompt-be-your-state-machine", "canonical_source": "https://dev.to/mudassirworks/stop-letting-the-prompt-be-your-state-machine-52pn", "published_at": "2026-06-16 18:34:17+00:00", "updated_at": "2026-06-16 18:47:07.530766+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "developer-tools"], "entities": ["Gartner", "Zod"], "alternates": {"html": "https://wpnews.pro/news/stop-letting-the-prompt-be-your-state-machine", "markdown": "https://wpnews.pro/news/stop-letting-the-prompt-be-your-state-machine.md", "text": "https://wpnews.pro/news/stop-letting-the-prompt-be-your-state-machine.txt", "jsonld": "https://wpnews.pro/news/stop-letting-the-prompt-be-your-state-machine.jsonld"}}