{"slug": "tokens-and-dreams", "title": "Tokens and Dreams", "summary": "A senior engineer at a software company found that generative AI produced a polished interactive dashboard with relevant visualizations from a simple CSV prompt, yet the same tool introduced cascading errors when tasked with writing code for a basic open-source search server. The engineer argues that while AI appears magical for data analysis, its coding output requires aggressive manual pruning that ultimately costs more time than it saves, challenging the narrative that AI is ready to replace human developers.", "body_md": "The one great principle of the English law is, to make business for itself.\n\n**Note:** I've revised my thinking somewhat since I wrote this, continuation [here](http://charlesleifer.com/blog/cave-of-forgotten-dreams/), *June 2nd*.\n\nThe recurring theme running through my mind the last few months has been\ncomplexity within a software application. Forget coding. Sales is using AI to\nwrite all new code, so for us engineers there's not a hell of a lot to do\n*besides* think (and be there to hold the bag).\n\nLast week I generated a CSV of some internal company metrics. With only a\nsentence or two of prompt, generative AI extrapolated meaningful signals,\ncorrelated changes in the data with external signals that were not explicitly\nexpressed (e.g. interest rate hikes), and built a polished interactive\ndashboard with relevant visualizations. Nevermind the fetishization of\ndark-mode or the tell-tale slop signs (what *is* it with that fucking font?) -\nmost people would never notice these, it's coded to look \"modern\" and it looks\nthe part. I didn't even ask for the dashboard or any visualizations. Results\nlike these seem magical. I believe this is how most people experience\ngenerative AI.\n\nAround the same time, I ran another AI coding experiment on one of my smaller\nopen-source libraries, [scout](https://github.com/coleifer/scout), and the\nprocess was so riddled with flaws and subtle failures that I know I lost\ntime (and sanity) by even attempting to let AI write code. You see, *scout* is\njust a dead-simple RESTful search server written as a [flask](https://flask.palletsproject.com/)\napp. This is not frontiers of engineering shit, it's about as mechanical as it\ngets in terms of implementation. As in my [previous experiments](http://charlesleifer.com/blog/tags/ai/)\nwith AI, the strength of the tool in coding tasks was that it could trace logic\nbugs and find inconsistencies precisely and accurately. The weakness is that as\nsoon as it began to write code it produced tangles of weeds that had to be\naggressively hand-pruned, because each iteration the weeds had a tendency to\nspread...and spread.\n\nThis is why I'm stuck, [the evidence is puzzling](https://www.youtube.com/watch?v=EOzc0_bfSvc). To push-back when people's daily experience of\nAI is of the *magical* variety is seen as almost perverse. I find myself\nconstantly wanting to say \"No! I embrace these tools! This is not thinly-veiled\nself-preservation *(actually it is, just not in the way you think)*! Just hear me out...\" But how do I express this when, at\nevery turn, a new silver bullet for agent orchestration, automatic coding,\nautomatic review, automatic thinking is being announced? Going further, as one\nconcerned with code as ground-truth for a system, how do I take the leap of\nfaith and relinquish control to a swarm of agents and markdown files?\n\nThese dynamics, the rise of agentic coding loops, and some unrelated [UFO stuff](https://en.wikipedia.org/wiki/Jacques_Vall%C3%A9e)\nhad me thinking about cybernetics (of all things). Cybernetics emerged after\nWWII as a framework for studying control mechanisms in complex systems. The\ncanonical example is a thermostat that kicks on heating or cooling when the\ntemperature falls outside the specified range, and then returns to passive mode\nwhen back within the acceptable range. The central idea is *feedback*.\n\nThe \"first law\" of cybernetics, Ashby's *Law of Requisite Variety*, states that\nin order to control a system, the regulating function (feedback) must be able\nto match the state-space complexity of the operating environment. The idea is\nthat *without* adaptive control, the environment dominates the system and\neventually leads to failure. In software engineering, I see a two-layered\nsystem where at the surface you have the software artifact itself, the\napplication that users interact with. It must be able to encode and handle the\ncomplexity of it's intended usages. And then beneath that you have the actual\ncode, the primary source of truth, where it is the programmer who is the\ncontrol function for the overall system. The programmer's job, then, is\ntwo-fold: to manage the state of the code so that it can produce an artifact\nwhich, in turn, correctly handles its designed use-case.\n\nThe framing also explains to me why I've found the greatest utility in AI\ntooling in analysis tasks. When directed to do deep analyses on existing\ncode-bases, reason about design tradeoffs, trace deadlocks or diagnose memory\nleaks AI has been amazing. In cybernetic terms, AI extends the amount of\n*variety* I'm able to cope with, and allows me to better regulate the code-base.\nYet when directed top-down with specs, no matter how detailed, AI replaces the\nregulator with its own loop, made from the same substrate as the thing being\nregulated - the model watching the code and the model producing the code\nare now the same kind of process, and control dissolves.\n\nAccording to that first law, the programmer must be able to match the state-space complexity of the code itself, in order to be able to effectively wield it and adapt it over time. Over the years, approaches like Agile, YAGNI, KISS all tend towards optimizing for this kind of adaptability. The core idea is to keep the system simple and minimal enough that both the programmer and the software artifact can adapt as things unfold. On the other end of the spectrum, domain-driven design and spec-driven development emphasize explicit front-loading of complexity modeling. This way the operating modes of the system are well-understood beforehand and the programmer's role becomes more mechanical. Formal methods, meanwhile, are in their own special corner. They front-load, too, but are anchored to machine-verifiable proofs and are the opposite of a vibed-out markdown file.\n\nThose readers who are familiar with my open-source work can probably guess\nwhich camp I belong to. I prefer smaller tools, [built bottom-up](https://www.nasa.gov/history/rogersrep/v2appf.htm),\nwhere the design, behavior and invariants can reasonably be held in your head.\nDesigning software from the bottom-up means building the lower-level component\npieces to be clean and orthogonal, so that they can be *composed* into larger\nstructures. When done correctly, new features tend to write themselves as new\npatterns **emerge**. For instance, working on [huey](https://huey.readthedocs.io/),\nthings like retry delays, revocation, rescheduling, ETAs, rate-limiting, chords\n-- all these features came out as natural consequences from a core set of\nbuilding blocks. They are robust because the underlying structures are robust\nand compose well.\n\nSo where does AI-written code live in this framework? To me it lands very firmly in the top-down world. There's been a recent wave of hype around \"spec-driven\" AI development, where you front-load all design requirements into markdown beforehand. But more importantly, in the two-layered model of control, AI tools eliminate the programmer-as-mediator of the system. All that exists is the artifact, produced by AI, and the specification - some of which exists in markdown, some of which is nothing but a dim spectre haunting a long-forgotten context window.\n\nWhen the programmer is removed as the control system for managing software complexity, what happens? AI evangelists would argue that control is retained, it has simply shifted to the network of prompts, code, tests, and agents. I would argue, based on my own experience, that this is actually where things begin to break down. An AI-modulated feedback loop inevitably becomes self-referential and at some point the loop closes, because the thing anchoring it to reality - the programmer - can no longer keep up. Code gets written, reviewed, tested and modified using the same system that produced it. Because the speed at which AI produces code far exceeds what any person can reasonably review and fully understand, there's a kind of event horizon that gets crossed. The break occurs and beyond that point the ownership of the code is implicitly transferred.\n\nThe consequences of an AI feedback loop go beyond the loss of that lower-layer of control (the system can be understood by human programmers). Errors in design have a way of compounding in AI-written code, so that you end up with many islands which are internally consistent, but do not compose well with one another, much less produce a coherent whole. Even in my tiny 1,000 LOC image viewer prototype, AI produced two completely independent thumbnail caching mechanisms, redundant image display widgets, and three nearly-identical implementations of a context-menu. When prompted to refactor, the result ended up being worse - skeletal remnants of the old APIs calling into \"refactored\" functions that held the same old logic grafted into new (redundant) functions.\n\nWorse still, this drift leads to real costs. Every iteration consumes tokens,\nso the code-base is not merely accumulating noise but *paying* to accumulate\nit. The feedback loop becomes an economically\nself-reinforcing pit of quicksand. Maybe just a few more hours of token-spend\nwill fix it... But the tens- or hundreds-of-thousands of lines of code sitting\nbehind the software artifact resist attempts to refactor, because the refactor\nrequires the same tools which introduced the problems in the first place.\n\nI recently had a call with an AI-native developer where, with refreshing candor, he showed me his AI development process. Several times he mentioned, almost apologetically and without a trace of defensiveness, that he was not a \"real developer\", yet he had vibe-coded a real product. He expressed frustration with opaque token spend, noted he had paid several thousand dollars over his normal usage just to get his agents \"un-stuck\" and the looms spinning again. One of the features in his application was a complex visualization of a knowledge-graph, each node brightly illuminated against a background web of connections. But for reasons which remain obscure, the graph had a tendency to wiggle around and reconfigure itself so that one was forced to mechanically mouse over nodes at random until the tooltip informed you that you'd arrived at the node you were interested in. How many more thousand dollar re-ups would it take to get the graph to sit still and behave, I wondered?\n\nLook how thou steppest!\n\nTake heed thou do not trample with thy feet\n\nThe heads of the tired, miserable brothers!\n\nAshby's Law gives us a few ways a thermostat can fail: it doesn't sample the environment frequently enough, it models the wrong system, or the environment changes too quickly to enable it to respond effectively. AI coding tools manage to hammer at all three of these at once, and the programmer can no longer be an effective regulator of the system. Iterating becomes a closed circuit of AI driving AI, while code bloats, errors compound, and prompts drift. The artifact may appear correct, but the underlying code is such a mess that no one can be sure.\n\n[Anthropic](https://github.com/anthropics/claude-code/issues/42796), OpenAI,\nGoogle all want us to believe that these tools will speed up and simplify the\nprocess of developing software. And in a way they do... right up until the\nevent-horizon is crossed and the loop closes. Beyond there is nothing but\niteration upon iteration, token burn, loss of grounding and increased spend. As\nthe software system evolves and code grows, there is an almost addictive sense\nof *making progress* - something the AI-native developer spoke about to me -\nbut towards whose goal? In the end the system may run, the dashboard may\ncontinue to render, and nobody will be able to say why.", "url": "https://wpnews.pro/news/tokens-and-dreams", "canonical_source": "http://charlesleifer.com/blog/tokens-and-dreams/", "published_at": "2026-05-05 16:48:03+00:00", "updated_at": "2026-06-03 04:07:15.133745+00:00", "lang": "en", "topics": ["generative-ai", "artificial-intelligence", "ai-tools", "ai-products"], "entities": ["scout", "Charles Leifer"], "alternates": {"html": "https://wpnews.pro/news/tokens-and-dreams", "markdown": "https://wpnews.pro/news/tokens-and-dreams.md", "text": "https://wpnews.pro/news/tokens-and-dreams.txt", "jsonld": "https://wpnews.pro/news/tokens-and-dreams.jsonld"}}