{"slug": "the-limits-of-stateless-llms", "title": "The Limits of Stateless LLMs", "summary": "Large language models have reached a hard limit in their ability to handle multi-step, coherent tasks due to their stateless, pattern-matching design, forcing industries to build external components for planning and memory. In software engineering, single-task autonomy is now solved, but vendors are shifting to add persistent state and constraint enforcement around LLMs to achieve multi-task feature ownership. This pattern extends to robotics, legal, and medical fields, where LLMs alone cannot maintain context or enforce rules, driving the development of specialized safety layers and structured systems.", "body_md": "We have reached the end of major gains from large language models (LLMs). LLMs are fundamentally stateless, pattern-matching, next-token predictors that, by design, have no internal causal model and no persistent memory.\n\nSystems that require a plan, the coherent handling of constraints, or the use of a long-lived context to maintain consistency across several updates, are all turning towards building such components around the LLM. No longer will a larger model suffice.\n\nAn LLM can answer medical questions, but it cannot: maintain patient state, track longitudinal data, enforce clinical guidelines or reason about drug interactions. Medical LLM products are shifting focus towards clinical decision engines, structured patient records, safety layers and protocol checkers.\n\nAIv2 will require great product and software engineering to give industries what they need: higher-level planning and multi-stage coherence.\n\n## The Limits of Stateless LLMs\n\nAs of June 2026, the Software Engineering industry has benefited from the rapid gains in LLM capability based on building bigger, rich models capable of accurately matching an ever larger number of patterns. Today, autonomous AI Engineers excel at single-task ownership: the ability to complete a single software engineering task so that a human software engineer can decide whether that change should be applied to the wider system.\n\nThe next stage for autonomous AI engineers is to excel at Feature Ownership:\nthe ability to complete the *multiple* tasks necessary to ensure a whole\nfeature can be safely applied to the wider system. A Feature requires multiple\ntasks such as safely update the code that works in a web browser; update the\nbackend code the browser code interacts with; and update any databases\nnecessary to store feature data. All of these changes must be performed in a\nmutually consistent way so that one component of the new feature does not break\nanother aspect of that same feature, *and* so any aspect of the new feature\ndoes not break anything already working in the current system.\n\nLLMs are fundamentally stateless, pattern-matching, next-token predictors that, by design, have no internal causal model and no persistent memory. This means an LLM cannot maintain a plan, a task graph, invariants, constraints or the long-lived context necessary to maintain consistency across multiple updates.\n\nThe safe update of the system to incorporate the multi-task feature requires: persistent state, multi-step reasoning, constraint enforcement, and cross-task coherence. These are all part of long-horizon enforcement: the capability to ensure that a plan, a set of constraints, or a set of invariants remains true across an extended sequence of tasks.\n\nBuilding a bigger model will not provide the necessary solution to accurately handle this. Therefore, vendors in the AI autonomous engineer space are currently building components around an LLM to provide the necessary support.\n\nSingle‑task autonomy is solved but multi‑task feature coherence remains unsolved.\n\nWe have reached a hard limit of what an LLM can provide on its own.\n\n## This Pattern in Other Industries\n\nLLMs can do one step but, due to their design, cannot maintain state across steps, nor can they enforce constraints or ensure coherence across steps.\n\nWe see this LLM hard limit in other industries.\n\n### Robotics\n\nRobotics teams report that LLMs alone cannot: maintain a world model, track state across multiple steps, plan reliably or adapt to unexpected changes.\n\nWork such as CodeAct and others are adding symbolic planners, state estimators, constraint solvers and safety layers around the LLM to provide the missing abilities.\n\n### Legal and compliance\n\nLLMs can summarise single contracts, but they cannot: maintain consistency across multiple documents, track obligations across clauses, enforce regulatory constraints or reason about dependencies between legal entities.\n\nRules engines, knowledge graphs, audit trails and structured memory (e.g., the work on CAUD) are being added to LLMs.\n\nFrom the work of CAUD by Hendrycks and others, many real-world document analysis tasks still do not make use of machine learning. Whether these large models can transfer to highly specialized domains remains an open question. To resolve this question, large specialized datasets are necessary.\n\n### Healthcare decision support\n\nAn LLM can answer medical questions, but it cannot: maintain patient state, track longitudinal data, enforce clinical guidelines or reason about drug interactions.\n\nIn their work on MedAction, Hsin-Ling Hsu and others report three recurring failure modes in current LLMs: ungrounded test ordering, unreliable diagnostic update, and degraded multi-turn coherence. Together, these reveal a core deficit: existing medical training data teaches models to reason from complete information but not to act under evolving, partial evidence. Such evolving evidence implies multiple steps.\n\nClinical decision engines, structured patient records, safety layers and protocol checkers are being put into place to address these issues.\n\n### Finance and trading\n\nLLMs can analyse a single report, but they cannot: maintain portfolio state, track risk constraints, reason about multi‑step strategies or enforce regulatory rules.\n\nAs reported by Mahdavi and others in their survey (citation 11), the work of Zhixuan Chu and others has been to provide a data-centric approach to enhance the efficacy of LLMs for financial tasks by addressing their limitations in integrating and reasoning complex financial data.\n\nFirms are adding risk engines, rule‑based planners, and stateful trading systems to address this.\n\n### Customer support automation\n\nLLMs can answer a single ticket, but they cannot: maintain a case history over multiple tickets, track multi‑step resolutions, enforce policy constraints or coordinate across channels.\n\nBalaji and others state that across 703 conversations spanning three domains, their structured workflow orchestration (Dynamic-Prompt-Agent) significantly outperforms prompt-based approaches, enabling even smaller models to exceed larger ones in policy compliance.\n\nTo address these, companies are adding integrated memory to customer relationship management, workflow engines, and policy validators.\n\n## Conclusion\n\nLLMs are fundamentally stateless, pattern-matching, next-token predictors. They have no internal causal model and no persistent memory.\n\nThis design has gotten us this far. LLMs excel at single-stage tasks, but when stepping into the richer domain of multiple steps, pattern-matching on its own is no longer enough.\n\nVendors are now building the missing capabilities around their LLMs.\n\nWe are now into AIv2. AIv1 was pattern-matching, single task. Version 2 will take this further to address multiple stages and high-level capabilities such as coherence across steps.\n\nCurrent products build support for v2 around the v1 LLM. The low-hanging fruit of v1 has been consumed. The leading products of tomorrow rely on engineering ability and sound product decisions.\n\nRead next:\n\n[The Big AI Gains Come From Teams, Not Individuals]\n\nAI can help individuals but bigger improvements will come from assisting the whole team\n\n# Related Articles\n\n**If this was useful**, you can get more pieces like it in the Phroneses newsletter.\n\nI work with leaders and teams on clarity, capability, and momentum.\n[Work with me →](/pages/services.html)\n\n# Table of Contents\n\n# Further Reading\n\n-\nCodeAct: Executable Code Actions Elicit Better LLM Agents https://arxiv.org/abs/2402.01030\n\n-\nCUAD: An Expert-Annotated NLP Dataset for Legal Contract Review https://arxiv.org/abs/2103.06268\n\n-\nMedAction: Towards Active Multi-turn Clinical Diagnostic LLMs https://arxiv.org/html/2605.07305v1\n\n-\nMahdavi and others: Integrating Large Language Models in Financial Investments and Market Analysis: A Survey\n\nhttps://arxiv.org/pdf/2507.01990 -\nSumanth Balaji and others: Beyond IVR: Benchmarking Customer Support LLM Agents for Business-Adherence\n\nhttps://arxiv.org/pdf/2601.00596", "url": "https://wpnews.pro/news/the-limits-of-stateless-llms", "canonical_source": "https://phroneses.com/articles/build/notes/the-limits-of-stateless-llms.html", "published_at": "2026-06-07 00:00:00+00:00", "updated_at": "2026-06-12 10:06:32.278331+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "ai-agents", "ai-products", "ai-research"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/the-limits-of-stateless-llms", "markdown": "https://wpnews.pro/news/the-limits-of-stateless-llms.md", "text": "https://wpnews.pro/news/the-limits-of-stateless-llms.txt", "jsonld": "https://wpnews.pro/news/the-limits-of-stateless-llms.jsonld"}}