{"slug": "your-ai-isn-t-broken-your-architecture-is", "title": "Your AI Isn't Broken. Your Architecture Is.", "summary": "A fintech engineer argues that LLM failures often stem from poor architecture rather than hallucination. They demonstrate that chaining multiple probabilistic LLM calls can drastically reduce reliability, and advocate for separating probabilistic tasks (handled by LLMs) from deterministic ones (handled by rule engines or APIs).", "body_md": "Everyone blames hallucination. I've started blaming the design.\n\nI work on a fintech banking platform — Java, Spring Boot, microservices. When a payment fails, we don't shrug and say \"the network is probabilistic.\" We trace it. We find the exact hop where something went wrong. We fix it.\n\nBut when an LLM-powered feature fails, the default reaction is usually: \"yeah, AI hallucinates sometimes.\"\n\nAnd after going through a structured ML cohort over the last few weeks, I think I finally understand why.\n\nLarge language models are probabilistic by design. They don't look up answers — they generate the most statistically likely next token given context. That means they will occasionally produce plausible-sounding output that isn't grounded in fact.\n\nThis is a known property, not a bug. The mistake is building systems that treat this probabilistic step as if it were a deterministic one.\n\nHere's a concrete example. Say you're building a banking chatbot that needs to:\n\nSteps 2 and 3 are deterministic. There's a correct answer. The transactions either exist or they don't. The sum is either right or wrong.\n\nIf you route those steps through an LLM — asking it to generate a SQL query, run it mentally, summarize the output — you've introduced a probabilistic component where zero ambiguity is acceptable. In a financial context, a \"plausible-sounding\" transaction summary that's 3% wrong is not a minor UX issue. It's a compliance problem.\n\nHere's what most people miss when they start chaining LLM calls together.\n\nIf each step in your pipeline has a 90% success rate — which sounds fine — and you have 5 steps, your overall pipeline reliability is:\n\n**0.9 × 0.9 × 0.9 × 0.9 × 0.9 = ~59%**\n\nA 5-step agentic workflow where every node is an LLM call fails 4 out of 10 times. Not because any single step is broken. Because the architecture is wrong.\n\nThis is something I think about in terms of how we handle fraud detection on our platform. The ML model's job is to score a transaction — is this pattern anomalous? That's genuinely probabilistic. Pattern matching under uncertainty is exactly what the model is good at.\n\nBut the downstream decision — block the card, flag for review, let it pass — that's a deterministic rule engine. Hard thresholds. Business logic. Audit trails. Putting an LLM in that loop would be architecturally insane, regardless of how good the model is.\n\nThe model handles ambiguity. The function handles decisions.\n\nEvery LLM tutorial shows you the happy path. Very few show you where the model should be completely absent from the pipeline.\n\nThe design question worth asking: **which parts of this workflow require genuine judgment or language understanding, and which parts have a correct, verifiable answer?**\n\nLLM's job: extract intent, handle ambiguity, generate natural language.\n\nFunction call / API / rule engine's job: everything with a ground truth.\n\nThis isn't a new insight — it's basically what tool-use and function calling were invented for. The model decides *what* to do. A real function actually *does* it. But a lot of builders still treat function calling as a nice-to-have instead of a load-bearing architectural decision.\n\nFor a while I thought the hard part was getting the model to behave. Prompt engineering. Fine-tuning. Better retrieval.\n\nThe cohort work I've been doing shifted that. The models are actually pretty capable. What's hard is:\n\nIf your AI feature is unreliable, the honest diagnostic question is: how many of my pipeline steps are probabilistic that shouldn't be? The answer is usually more than you think.\n\nHallucination is real. But it's also one of the most convenient excuses in AI engineering right now.\n\nMost of the failures I've seen — in projects, in tutorials, in production systems discussed in public postmortems — aren't the model generating nonsense. They're systems that were designed without a clear line between \"where the LLM is appropriate\" and \"where a function call is appropriate.\"\n\nDraw that line first. Build around it. Then see how often the model is actually the problem.", "url": "https://wpnews.pro/news/your-ai-isn-t-broken-your-architecture-is", "canonical_source": "https://dev.to/abhii07/your-ai-isnt-broken-your-architecture-is-5fam", "published_at": "2026-06-21 06:49:36+00:00", "updated_at": "2026-06-21 07:06:30.706586+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "ai-products", "ai-infrastructure", "developer-tools"], "entities": ["Java", "Spring Boot"], "alternates": {"html": "https://wpnews.pro/news/your-ai-isn-t-broken-your-architecture-is", "markdown": "https://wpnews.pro/news/your-ai-isn-t-broken-your-architecture-is.md", "text": "https://wpnews.pro/news/your-ai-isn-t-broken-your-architecture-is.txt", "jsonld": "https://wpnews.pro/news/your-ai-isn-t-broken-your-architecture-is.jsonld"}}