Your AI Isn't Broken. Your Architecture Is.

wpnews.pro

cd /news/large-language-models/your-ai-isn-t-broken-your-architectu… · home › topics › large-language-models › article

[ARTICLE · art-35363] src=dev.to ↗ pub=2026-06-21T06:49Z topic=large-language-models verified=true sentiment=· neutral

Your AI Isn't Broken. Your Architecture Is.

A fintech engineer argues that LLM failures often stem from poor architecture rather than hallucination. They demonstrate that chaining multiple probabilistic LLM calls can drastically reduce reliability, and advocate for separating probabilistic tasks (handled by LLMs) from deterministic ones (handled by rule engines or APIs).

read3 min views1 publishedJun 21, 2026

Everyone blames hallucination. I've started blaming the design.

I work on a fintech banking platform — Java, Spring Boot, microservices. When a payment fails, we don't shrug and say "the network is probabilistic." We trace it. We find the exact hop where something went wrong. We fix it.

But when an LLM-powered feature fails, the default reaction is usually: "yeah, AI hallucinates sometimes."

And after going through a structured ML cohort over the last few weeks, I think I finally understand why.

Large language models are probabilistic by design. They don't look up answers — they generate the most statistically likely next token given context. That means they will occasionally produce plausible-sounding output that isn't grounded in fact.

This is a known property, not a bug. The mistake is building systems that treat this probabilistic step as if it were a deterministic one.

Here's a concrete example. Say you're building a banking chatbot that needs to:

Steps 2 and 3 are deterministic. There's a correct answer. The transactions either exist or they don't. The sum is either right or wrong.

If you route those steps through an LLM — asking it to generate a SQL query, run it mentally, summarize the output — you've introduced a probabilistic component where zero ambiguity is acceptable. In a financial context, a "plausible-sounding" transaction summary that's 3% wrong is not a minor UX issue. It's a compliance problem. Here's what most people miss when they start chaining LLM calls together.

If each step in your pipeline has a 90% success rate — which sounds fine — and you have 5 steps, your overall pipeline reliability is: 0.9 × 0.9 × 0.9 × 0.9 × 0.9 = ~59%

A 5-step agentic workflow where every node is an LLM call fails 4 out of 10 times. Not because any single step is broken. Because the architecture is wrong.

This is something I think about in terms of how we handle fraud detection on our platform. The ML model's job is to score a transaction — is this pattern anomalous? That's genuinely probabilistic. Pattern matching under uncertainty is exactly what the model is good at.

But the downstream decision — block the card, flag for review, let it pass — that's a deterministic rule engine. Hard thresholds. Business logic. Audit trails. Putting an LLM in that loop would be architecturally insane, regardless of how good the model is.

The model handles ambiguity. The function handles decisions.

Every LLM tutorial shows you the happy path. Very few show you where the model should be completely absent from the pipeline.

The design question worth asking: which parts of this workflow require genuine judgment or language understanding, and which parts have a correct, verifiable answer?

LLM's job: extract intent, handle ambiguity, generate natural language.

Function call / API / rule engine's job: everything with a ground truth. This isn't a new insight — it's basically what tool-use and function calling were invented for. The model decides what to do. A real function actually does it. But a lot of builders still treat function calling as a nice-to-have instead of a load-bearing architectural decision.

For a while I thought the hard part was getting the model to behave. Prompt engineering. Fine-tuning. Better retrieval. The cohort work I've been doing shifted that. The models are actually pretty capable. What's hard is:

If your AI feature is unreliable, the honest diagnostic question is: how many of my pipeline steps are probabilistic that shouldn't be? The answer is usually more than you think. Hallucination is real. But it's also one of the most convenient excuses in AI engineering right now.

Most of the failures I've seen — in projects, in tutorials, in production systems discussed in public postmortems — aren't the model generating nonsense. They're systems that were designed without a clear line between "where the LLM is appropriate" and "where a function call is appropriate."

Draw that line first. Build around it. Then see how often the model is actually the problem.

source & further reading

dev.to — original article The Aftermarket She Diagnosed is the Aftermarket She Prescribed Evaluating Kimi 2.5 vs Kimi 2.6: What happens to agent skills when the model gets smarter? Goal In, DAG Out: How Open-Multi-Agent Turns a Goal into a Task DAG

~/api · this article 200

$curl api.wpnews.pro/v1/news/your-ai-isn-t-broken-you…

Read original on dev.to → dev.to/abhii07/your-ai-isnt-broken-your-architec…

mentioned entities

Java

Spring Boot

metadata

slugyour-ai-isn-t-broken-your-architecture-is

topic#large-language-models

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevThe Aftermarket She Diagnosed is…

next →Show HN: MP3 to Mp4 Converter

── more in #large-language-models 4 stories · sorted by recency

dev.to · 21 Jun · #large-language-models

Evaluating Kimi 2.5 vs Kimi 2.6: What happens to agent skills when the model gets smarter?

dev.to · 21 Jun · #large-language-models

Artifacts in Claude Code: The Operator's Guide

dev.to · 21 Jun · #large-language-models

"EcoSphere AI: Why I separated 'logic' from 'AI' when building a carbon footprint assistant"

dev.to · 21 Jun · #large-language-models

From Prompting ChatGPT to Orchestrating AI Agents: Two Years as an Ordinary Engineer

── more on @java 3 stories trending now

wpnews · 20 Jun · #ai-agents

Amazon Bedrock AgentCore Memory: Build AI Agents That Remember

wpnews · 20 Jun · #ai-safety

SR 11-7 Model Risk for AI Systems: What Banks Actually Need to Build

wpnews · 20 Jun · #artificial-intelligence

Microsoft is rewriting the economics of enterprise AI and the bill shock is just getting started

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required