AI does exactly what you ask — that's the problem

Here is a factual summary of the article:

The article argues that modern AI coding assistants like Claude 4.x and GPT-4o have become strictly literal in following prompts, which means they will produce technically correct but unusable code if the prompt lacks sufficient context. To get good results, developers must provide the same level of detail they would give a junior developer, including the tech stack, exact error messages, and explicit constraints. The author recommends structuring prompts with XML tags, placing code before the instruction, and always including what has already been tried when debugging.

I asked Claude to add pagination to a product list. Response in 30 seconds: clean, functional, complete. And completely disconnected from the rest of the app. Wrong pagination component we already had one , invented styles, existing filters broken. Technically correct. Unusable as-is. The problem wasn't the AI — it was my prompt. I wrote: "Add pagination to this list." That's exactly what it did. Nothing more, nothing less. Current models Claude 4.x, GPT-4o have dropped the "infer intent" behavior. They take prompts literally. That's progress overall, but it fundamentally changes how you need to prompt for code. A good code prompt isn't about tricks — it's about giving the AI the same context you'd give a junior dev joining the project. I tested and scored dozens of formulations across the four most common coding tasks. Here's what actually works. The structure that applies to everything Before getting into specific cases, there are four elements present in every good code prompt: - Stack/context — language, version, framework, relevant files - Precise task — what you want, phrased as an instruction, not a question - Constraints — what NOT to touch, scope limits - Expected output — diff, full code, explanation only, CVSS score… Anthropic's golden rule sums it up: show your prompt to a colleague with no context. If they'd be confused, the AI will be too. For larger prompts multiple files, complex instructions , XML tags help separate sections. Tests show 30–39% improvement in response quality when prompts are structured with clear tags: <context PHP 8.1, Laravel 10, multi-tenant app</context <task Add pagination to ProductList</task <constraints Do not touch getFilteredProducts </constraints <code paste files here </code Second rule: code first, question last . Placing context and files at the top with the instruction at the bottom improves precision by ~30% on large contexts. The AI reads everything before acting. Bug fix — describe the symptom, not the theory The classic mistake: paste code and ask "this doesn't work." The AI doesn't know if it's crashing, returning a wrong value, or just too slow. It will invent a plausible problem and "fix" it. ❌ Vague prompt — score: 3/10 php This code doesn't work, help me. function getUserById $id { $result = $db- query "SELECT FROM users WHERE id = $id" ; return $result- fetch ; } Typical result: the AI fixes the SQL injection fair, real issue , ignores the actual error, and rewrites with PDO prepared statements. Technically sound, but not what you needed. ✅ Precise prompt — score: 9/10 Stack: PHP 8.1 + PDO, Laravel 10. Current behavior: Call to a member function fetch on bool → only when the ID doesn't exist in the table Expected behavior: Return null if user doesn't exist no exception thrown Already tried: isset $result before fetch — same error Code: function getUserById $id { $result = $db- query "SELECT FROM users WHERE id = $id" ; return $result- fetch ; } The "already tried" line is critical. Without it, the AI will re-suggest exactly what you already attempted. With it, it looks elsewhere — and finds that PDO::query returns false on failure, so the fix is $result == false ? $result- fetch : null . Bug fix template: Stack: language + version + framework Current behavior: exact symptom — error message, wrong return value, observed behavior Expected behavior: what the code should do Already tried: previous attempts and why they didn't work Code: minimal code that reproduces the bug New feature — acceptance criteria and out-of-scope "Add a comment system." The AI will choose a stack database, probably , a UI modals, inline, dedicated page? , validation client-side, server-side? . Each of those choices may be incompatible with your project. ❌ Without integration context — score: 3/10 Add a comment system to this blog. Likely result: MySQL solution, jQuery form, a design that matches nothing in the existing project. ✅ With context, criteria, and out-of-scope — score: 9/10 Stack: PHP 8 + Bootstrap 3, no database JSON file storage . Task: Add comments to blog articles. Acceptance criteria: - Form: name + message no email, no account required - Server-side validation only no JS - Storage: one JSON file per article at /blog/comments/{slug}.json - Email notification to author on each new comment PHPMailer already installed Out of scope: - No moderation for now - No nested replies - No changes to existing CSS Files involved: blog footer.php, template.php The out-of-scope section is the key difference. Telling the AI what not to implement is just as important as what to build. Without it, the AI will either over-engineer moderation, auth, nested threads or invent constraints that don't exist. New feature template: Stack: technical context Integration context: what it must integrate with — existing components, project patterns Task: precise feature description Acceptance criteria: list of expected behaviors Out of scope: what we do NOT want implemented now Files involved: files to modify or create Refactoring — define the problem, not the solution "Refactor this code" is the worst possible refactoring prompt. The AI will choose its own quality criteria, likely change method signatures, add unsolicited abstractions, and break the existing public API. ❌ Without goal or constraints — score: 2/10 Refactor this code to make it cleaner. 450 lines of OrderService Typical result: 8 extracted private methods, renames, interfaces added, an abstract class "for future flexibility." It compiles. Tests break. ✅ Concrete problem + goal + constraints — score: 9/10 Concrete problem: OrderService::processOrder 450 lines is called from 8 places with different parameter combinations. Impossible to know which cases are covered by existing tests. Goal: Extract pricing rules into immutable Value Objects. One class = one rule e.g., DiscountRule, TaxRule, ShippingRule . Hard constraints: - Observable behavior unchanged — existing tests must pass without modification - Do not modify public method signatures - Do not change return types Scope: pricing calculation only, not persistence or validation. Code: OrderService.php The key phrase: "observable behavior unchanged — existing tests must pass without modification." Without this constraint, the AI optimizes by its own criteria. With it, it stays on track. Refactoring template: Concrete problem: why this code is problematic — not "it's messy", but the real impact Goal: what we want after the refactoring Hard constraints: - what must not change: signatures, tests, observable behavior Scope: what's IN and what's OUT Code: the code to refactor Code review — choose your angle Without a focus, the AI runs a generic style review. It will flag missing docstrings, suggest more descriptive variable names, and miss the SQL injection sitting at the bottom of the file. AI code review only pays off if you tell it where to focus. ❌ Generic review — score: 4/10 Review this code as a senior dev and give me improvement suggestions. Typical result: 80% style suggestions, naming, comments. 20% real issues — buried in noise. ✅ Targeted angle + intentional decisions — score: 9/10 Context: Public endpoint POST /api/documents/upload Unfiltered incoming data, filesystem access, JWT auth verified upstream. Review focus: security only - Injection command, SQL, path - Path traversal - Malicious file uploads server-side execution - IDOR Intentional decisions — do not flag: - The int cast on ID is deliberate - Minimal error handling is intentional internal app, centralized logs - Variable name $tmp follows team convention Output format: For each issue: severity critical/high/medium + description + fix with code. Code: upload-handler.php The "intentional decisions" section cuts noise in half. The AI doesn't flag what you already know about — it focuses on what you asked for. Code review template: Context: endpoint type, who calls it, input data, auth in place Focus: security / performance / business logic / architecture — one angle at a time Intentional decisions do not flag : deliberate choices in the code Output format: severity + description + fix with code / simple list / … Code: the code to review Two cross-cutting techniques that change everything 1. Two passes beat one For complex tasks large refactors, deep reviews , separating critique from implementation consistently outperforms asking for everything at once: Pass 1: "What's wrong with this code? List the problems without fixing anything." Pass 2: "Now fix problems 1, 2, and 4 you identified. Skip 3." The separation forces the AI to analyze before acting. Second-pass results are significantly more precise. 2. "Suggest" vs "Change" — it matters now With Claude 4.x, this distinction became literal: - Can you suggest some changes? → AI lists suggestions, touches nothing - Change this function to improve its performance. → AI modifies If you want the AI to act: use action verbs. "Modify", "Fix", "Extract", "Rename". Not "Could you maybe look at whether...". Summary: the 4 templates Task Essential elements Most common mistake Bug fix Exact symptom + expected behavior + "already tried" Forgetting previous attempts New feature Integration context + criteria + out-of-scope Not defining out-of-scope Refactoring Concrete problem + goal + behavioral constraints No "tests must pass" constraint Code review Targeted angle + intentional decisions + output format Generic review without focus Conclusion The real value of these templates isn't about the AI — it's about you. Writing "current behavior vs expected behavior" forces you to precisely characterize the bug. Writing out-of-scope forces you to decide what isn't a priority. Writing refactoring constraints forces you to define what must not move. Multiple times, while drafting a structured prompt, I realized the task itself was poorly defined from the start. The AI didn't need to run — the prompt had already done the work. AI doesn't solve the problem for you. It solves the problem you described. You might as well make that description accurate. 📄 Associated CLAUDE.md