{"slug": "don-t-ask-ai-to-stop-guessing-design-a-system-where-it-doesn-t-need-to", "title": "Don't Ask AI to Stop Guessing. Design a System Where It Doesn't Need To.", "summary": "A team building an AI-assisted comparison tool found that an automated agent incorrectly removed a live capability from the tool's recommendations because it relied on a stale handover note instead of the authoritative system of record. The team abandoned prompt-level fixes and instead implemented a source-ranking architecture that prevents prose or summaries from overriding production data, ensuring the model never has to guess which source to trust.", "body_md": "*Part of a short series (1 of 3) on engineering lessons from building governed, AI-assisted production systems. Each piece takes one real failure and the architectural idea it forced. The examples are ours; the principle is meant to be transferable.*\n\nWe removed a capability that genuinely existed — from a tool whose entire job was to represent capabilities fairly. Not because the model hallucinated. Because our architecture made guessing the rational thing to do.\n\nIt reasoned correctly from the inputs it was given. The inputs were the problem — and so was our first instinct about how to fix them. This is a write-up of what failed, why the obvious fix didn't work, and the pattern we ended up with. None of it is about prompting.\n\nWe run a tool that compares options for a buyer and recommends the best fit. It's supposed to be neutral. One of the inputs to that tool is a set of *capabilities* — what each option can actually do.\n\nAn automated agent, doing maintenance work, updated that capability set. It removed one of our own capabilities, on the grounds that it had been discontinued. It cited its source: a handover note from an earlier work session that said, in passing, that the capability had been \"parked.\"\n\nThe capability had not been parked. It was live, published, and in active use. But for the duration of that change, a tool that was supposed to be neutral became quietly *unfair* — it stopped representing something that genuinely existed.\n\nThe agent was not careless. If you read the note it was working from, you would have drawn the same conclusion. The note was wrong, and nothing in the workflow forced a check against the thing that was actually true.\n\nThe obvious fix is the one everyone reaches for: tell the model to be more careful. Add instructions. \"Verify before asserting.\" \"Do not rely on summaries.\" \"Check the source of truth.\" Strengthen the prompt.\n\nWe tried versions of this. It moves the failure rate; it does not remove the failure. And once you sit with *why*, that becomes obvious too.\n\nAn LLM reasons over the context it's handed. If the nearest, most fluent description of reality is a summary, the model will use the summary — not because it's lazy, but because the summary is *right there* and reads authoritatively. \"Be careful\" is an instruction to expend extra effort against an unspecified target. It competes with every other instruction in the context, and it degrades exactly when you most need it: under long context, time pressure, or a confidently-worded but stale narrative.\n\nThe deeper issue is that we were treating a **systems problem** as a **behaviour problem**. We had two descriptions of reality — an authoritative one (the live system of record) and a convenient one (prose) — and we left it to the model's judgement to pick the right one every single time. That's not a judgement we should have delegated. The model didn't have a guessing problem. *We had given it a reason to guess.*\n\nWe stopped trying to make the model choose correctly between sources, and instead made the authoritative source the only path to a fact.\n\nTwo ideas did the work.\n\n**First: rank the sources explicitly, and let the ranking — not the model — resolve conflicts.**\n\n```\nauthoritative = resolve(\n    production,   # system of record — authoritative\n    derived,      # computed from production\n    override,     # a cited fact, only where production is silent\n    narrative,    # summaries, handovers — NEVER authoritative\n)\n\n# A lower tier is consulted only when every higher tier is silent.\n# Production truth can never silently fall through to a lower one.\nassert not (production.has_answer and authoritative.source != PRODUCTION)\n```\n\nThe point of the `assert`\n\nis not defensive coding. It's a statement of intent: when production has an answer, nothing below it gets a vote. Prose can inform where production is silent, but it can never override — or quietly stand in for — a fact the system of record already holds. And — this is the part that bit us — **absence has to be proven from the authoritative source, not inferred from a summary that failed to mention it.**\n\n**Second: derive facts, don't assert them.**\n\nThe capability set is no longer something a human or an agent edits by hand. It is *computed* from the live system of record at build time.\n\n```\ncapabilities = derive_from(system_of_record.published_items())\n# absence of a capability is established by its absence in (system_of_record),\n# never by its absence in a document.\n```\n\nOnce capabilities are derived, the class of bug we hit becomes structurally impossible. You cannot remove a live capability by editing a note, because notes are no longer in the path. The system self-corrects whenever the system of record changes. Nobody has to remember to keep the description in sync, because there is no separate description to keep in sync.\n\nThis is the part I'd most want a sceptical reader to notice, because it's where restraint mattered more than cleverness.\n\nWe automated the **source of truth**. We did not automate the **decision**.\n\nWhen the derived capability set and someone's expectation disagree, the system does not silently \"fix\" anything. It surfaces the discrepancy and stops. A human decides whether the difference is a genuine change, a mistake, or an intended exception. We never gave the pipeline the authority to *assert* a new fact about the world — only to *derive* facts from a system that already holds them, and to flag when something looks off.\n\nThe temptation, once you've built a resolver, is to let it auto-resolve everything. We didn't, because \"what exists\" is a fact (derivable) but \"what *should* exist\" is a decision (not). Collapsing those two is how you build a system that is confidently, automatically wrong.\n\nThere's a clean line underneath all of this:\n\nFacts come from systems. Decisions come from people. A pipeline may derive facts and flag conflicts; it may never decide what is true.\n\nWe still rely on behaviour — the agent is expected to reason from the authoritative source. But we no longer *trust* behaviour to be the only line of defence. There is one small machine check that runs in the pipeline: it recomputes the capability set from the system of record and fails the build if what we're about to ship has drifted from what the source actually exposes.\n\nThat check doesn't make the model behave. It makes the *invariant observable*. If behaviour silently regresses, the witness fails loudly before anything reaches a user. Behaviour governs; the check is evidence that the invariant still holds. We were careful not to confuse the two — a passing check is not proof of good judgement, only proof that one specific, decidable property is intact.\n\nThe reframe that mattered for us was small but total. We had been asking, \"how do we get the model to stop guessing?\" The better question was, \"why is the model in a position where guessing is reasonable?\" Once we removed the reason, the behaviour took care of itself.\n\n## The Engineering Principle\n\nAn agent guesses when its inputs leave room for guessing. Don't instruct it to stop — remove the ambiguity. Derive facts from the system of record, rank every source so prose can never outrank truth, and surface conflicts instead of resolving them automatically.\n\nDon't ask AI to stop guessing. Design a system where it doesn't need to.", "url": "https://wpnews.pro/news/don-t-ask-ai-to-stop-guessing-design-a-system-where-it-doesn-t-need-to", "canonical_source": "https://dev.to/cpdforge/dont-ask-ai-to-stop-guessing-design-a-system-where-it-doesnt-need-to-3kfm", "published_at": "2026-06-30 17:18:02+00:00", "updated_at": "2026-06-30 17:49:14.174651+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "ai-safety", "developer-tools"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/don-t-ask-ai-to-stop-guessing-design-a-system-where-it-doesn-t-need-to", "markdown": "https://wpnews.pro/news/don-t-ask-ai-to-stop-guessing-design-a-system-where-it-doesn-t-need-to.md", "text": "https://wpnews.pro/news/don-t-ask-ai-to-stop-guessing-design-a-system-where-it-doesn-t-need-to.txt", "jsonld": "https://wpnews.pro/news/don-t-ask-ai-to-stop-guessing-design-a-system-where-it-doesn-t-need-to.jsonld"}}