AI Won't Start For You

A developer argues that current artificial narrow intelligence (ANI) systems cannot replace human developers because they lack autonomy. While AI can generate code and answer questions, it cannot define problems, verify solutions against real-world requirements, or close the gap between a user's description and their actual needs. The developer illustrates this with a real project example where generated code passed tests but failed in production due to an unstated database requirement.

Hey, it has been a while. Good to be back. Every few months the same debate restarts. Will AI replace developers? Will it replace designers, writers, support staff? The answers range from "absolutely yes, within five years" to "no, never, human creativity is irreplaceable." Both sides are usually talking about different things without knowing it. The AI available today - every chat model, every code assistant, every image generator - is ANI. Artificial narrow intelligence. It does specific tasks, and it does them well. In many cases, better than a human doing the same task. ANI is harder to build than it sounds. Getting a model to generate plausible-looking code was not obvious. Getting it to answer questions accurately was not obvious. The engineering that went into current models is genuinely impressive, and the capabilities are real. But "does specific tasks well" is not the same as "replaces the person who decides which tasks to do." That gap has a name. It is called autonomy. AI does not ask you what the problem is. You ask AI questions, and it answers them. That sentence sounds obvious. Most of its practical consequences get ignored. Think about what happens when you open a chat window and type "build me a website with a comments section." You will get something. It will look reasonable. The comments section will function. The CSS will be decent. But you started. You defined the problem. You decided what "website" meant, what "comments section" meant, what the user flow should be, what data gets stored, and what counts as done. The AI generated an answer to your question. It did not generate your question. A model running without input does not decide to build something. It does not notice that a problem exists. It does not get frustrated with a slow workflow and think "there should be a better way." It waits. Every AI system in production today, no matter how capable, is waiting for a human to tell it what to do. This is not a limitation waiting to be patched in the next model version. It is structural. A tool does not have a problem statement. That is what makes it a tool. Here is the part that catches people off guard. When AI generates code, it cannot verify whether that code solves your problem. It can check if the code is syntactically valid. It can reason about whether a function matches its signature. What it cannot do is look at the output and ask: does this match what the user actually needed? Because it was never the one who needed it. You described a problem in words. The AI translated those words into code. But the original problem - the real one, with business logic, edge cases, and a specific user in front of it - never transferred. The AI has a description. You have a problem. Those are different things. This is why generated code that passes every test can still be wrong. The tests were also generated from the description. Both the code and the tests are internally consistent with what you typed. Neither is verified against what you meant. A concrete version of this from a real project. While building Proof Integrity https://proofintegrity.net , a JPQL repository query passed a nullable String search param. Tests passed cleanly on H2. Production, on PostgreSQL, threw: function lower bytea does not exist H2 silently treats null parameters as untyped. PostgreSQL's prepared-statement parser defaulted to bytea when it could not infer the type. The fix was four lines of cast :search as string . The query and the tests were internally consistent with each other the entire time. Neither was verified against what the real database actually required. That structure is identical to the AI verification problem. Replace "H2" with "your description" and "PostgreSQL" with "your actual requirement" and it is the same gap. The surrogate passes. Reality does not. The only one who can close that gap is you. AI cannot audit its output against a problem statement it never had. None of this means AI is less useful than advertised. It means the bottleneck is not the AI. The bottleneck is always the question. And the conversation after it. A vague question produces a plausible answer that solves nothing. "Build me a website" is a vague question. "Build me a single-page PHP application with a form that posts to /comments and stores the input in a SQLite table with four columns: id, name, body, created at" is a precise question. The output quality on the second prompt is dramatically higher - not because the AI got better, but because you did the thinking first. The same applies when the AI responds. If it gives you a threaded comment structure and you say "looks good," you have accepted its interpretation of your problem. If you say "no, flat comments only, threading adds complexity I do not want to maintain," you pushed back from your actual understanding. The AI can only work with what you give it at every turn, not just the first one. Weak replies compound the same way weak questions do. You can extend what the model knows with a RAG system - embed your documents, retrieve relevant chunks at query time, inject them as context. You can fine-tune the model's behavior if you have the compute and a clear behavioral gap to close. Both are real tools for real problems. Neither gives the model autonomy. It still waits for your query. It still cannot decide what to search for, or notice that a search is needed. Every AI capability scales with the quality of the input you give it. The input is always yours. AGI - artificial general intelligence - is the version that would actually change this. The line between ANI and AGI is not raw capability. Current models already exceed human performance on many narrow tasks. The line is autonomy: can the system identify that a problem exists, decide it is worth solving, ask the questions needed to start, act on the answers, and then verify its output against the original problem it set for itself? That loop - problem identification, initiation, action, verification - is what humans do and what ANI does not. If a model could close that loop without human input at each step, the replacement conversation would look very different. Whether AGI is achievable, and on what timeline, is a genuinely open question. Reasonable people disagree. What is not debatable is that none of the systems available today do this. They are extraordinarily capable tools that require a human at the start, at every turn in the conversation, and at the end to decide if the output is actually right. Can AI replace humans entirely? With ANI, no. Not because of some irreducible human essence, but because the job of a developer - or any knowledge worker - is not just to answer questions. It is to notice which questions need asking, in which order, and to own the answer when the output is wrong. ANI handles the answering part at impressive speed. The noticing part, the initiating part, the owning-the-problem part: those have not moved. That is what makes the current moment interesting. The execution cost dropped dramatically. The thinking cost stayed the same. Which means the thinking is now the only part that matters. I write about software development and building PointArt - a zero-dependency PHP micro-framework. Previous articles are in the PointArt Devlog Series. PS: These are mostly my own opinions. I am genuinely curious where others land on this - drop a comment if you disagree or see it differently.