cd /news/large-language-models/your-ai-isn-t-broken-your-architectu… · home topics large-language-models article
[ARTICLE · art-35363] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Your AI Isn't Broken. Your Architecture Is.

A fintech engineer argues that LLM failures often stem from poor architecture rather than hallucination. They demonstrate that chaining multiple probabilistic LLM calls can drastically reduce reliability, and advocate for separating probabilistic tasks (handled by LLMs) from deterministic ones (handled by rule engines or APIs).

read3 min views1 publishedJun 21, 2026

Everyone blames hallucination. I've started blaming the design.

I work on a fintech banking platform — Java, Spring Boot, microservices. When a payment fails, we don't shrug and say "the network is probabilistic." We trace it. We find the exact hop where something went wrong. We fix it.

But when an LLM-powered feature fails, the default reaction is usually: "yeah, AI hallucinates sometimes."

And after going through a structured ML cohort over the last few weeks, I think I finally understand why.

Large language models are probabilistic by design. They don't look up answers — they generate the most statistically likely next token given context. That means they will occasionally produce plausible-sounding output that isn't grounded in fact.

This is a known property, not a bug. The mistake is building systems that treat this probabilistic step as if it were a deterministic one.

Here's a concrete example. Say you're building a banking chatbot that needs to:

Steps 2 and 3 are deterministic. There's a correct answer. The transactions either exist or they don't. The sum is either right or wrong.

If you route those steps through an LLM — asking it to generate a SQL query, run it mentally, summarize the output — you've introduced a probabilistic component where zero ambiguity is acceptable. In a financial context, a "plausible-sounding" transaction summary that's 3% wrong is not a minor UX issue. It's a compliance problem. Here's what most people miss when they start chaining LLM calls together.

If each step in your pipeline has a 90% success rate — which sounds fine — and you have 5 steps, your overall pipeline reliability is: 0.9 × 0.9 × 0.9 × 0.9 × 0.9 = ~59%

A 5-step agentic workflow where every node is an LLM call fails 4 out of 10 times. Not because any single step is broken. Because the architecture is wrong.

This is something I think about in terms of how we handle fraud detection on our platform. The ML model's job is to score a transaction — is this pattern anomalous? That's genuinely probabilistic. Pattern matching under uncertainty is exactly what the model is good at.

But the downstream decision — block the card, flag for review, let it pass — that's a deterministic rule engine. Hard thresholds. Business logic. Audit trails. Putting an LLM in that loop would be architecturally insane, regardless of how good the model is.

The model handles ambiguity. The function handles decisions.

Every LLM tutorial shows you the happy path. Very few show you where the model should be completely absent from the pipeline.

The design question worth asking: which parts of this workflow require genuine judgment or language understanding, and which parts have a correct, verifiable answer?

LLM's job: extract intent, handle ambiguity, generate natural language.

Function call / API / rule engine's job: everything with a ground truth. This isn't a new insight — it's basically what tool-use and function calling were invented for. The model decides what to do. A real function actually does it. But a lot of builders still treat function calling as a nice-to-have instead of a load-bearing architectural decision.

For a while I thought the hard part was getting the model to behave. Prompt engineering. Fine-tuning. Better retrieval. The cohort work I've been doing shifted that. The models are actually pretty capable. What's hard is:

If your AI feature is unreliable, the honest diagnostic question is: how many of my pipeline steps are probabilistic that shouldn't be? The answer is usually more than you think. Hallucination is real. But it's also one of the most convenient excuses in AI engineering right now.

Most of the failures I've seen — in projects, in tutorials, in production systems discussed in public postmortems — aren't the model generating nonsense. They're systems that were designed without a clear line between "where the LLM is appropriate" and "where a function call is appropriate."

Draw that line first. Build around it. Then see how often the model is actually the problem.

── more in #large-language-models 4 stories · sorted by recency
── more on @java 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/your-ai-isn-t-broken…] indexed:0 read:3min 2026-06-21 ·