{"slug": "what-happens-when-every-prompt-slot-says-something-different", "title": "What Happens When Every Prompt Slot Says Something Different", "summary": "A developer found that when conflicting instructions are placed in different prompt slots (system prompt, user message, tool description), Qwen 2.5-Coder 3B predominantly follows the user message instruction (60% of runs), while Claude Haiku 4.5 and Claude Sonnet 4.6 consistently follow all instructions perfectly. The experiment also revealed that nearly a third of Qwen runs produced no expected marker, and 6% showed multiple conflicting markers.", "body_md": "*A controlled experiment exploring how Claude and Qwen resolve conflicting instructions across system prompts, user messages, and tool descriptions.*\n\nCross-posting from Medium:\n\n[https://medium.com/@rajkundalia/where-you-put-the-instruction-matters-more-than-what-it-says-2d5ffcdd9369]\n\nIn the first experiment of the series, **Where You Put the Instruction Matters More Than What It Says**, I asked a simple question:\n\nDoes it matter where you place an instruction?\n\nThe answer depended entirely on the model.\n\nFor **Qwen 2.5-Coder 3B**, the answer was **yes**. The same instruction produced dramatically different compliance rates depending on whether it lived in the system prompt, user message (or task prompt), or tool description.\n\nFor **Claude Haiku 4.5** and **Claude Sonnet 4.6**, the answer appeared to be **no**. Both models followed the instruction perfectly regardless of where it was placed.\n\nThat experiment measured **placement strength**.\n\nBut it left an obvious follow-up question unanswered.\n\nWhat happens when every prompt slot says something different?\n\nThat's what this experiment measures.\n\n**GitHub repository:**\n\n[https://github.com/rajkundalia/prompt-placement-anatomy](https://github.com/rajkundalia/prompt-placement-anatomy)\n\nThe underlying task is unchanged from Part 1.\n\nThe agent counts TODO markers across five markdown files using two filesystem tools: `list_files`\n\nand `read_file`\n\n.\n\nThe models are the same.\n\nThe agent loop is the same.\n\nThe only thing that changes is the prompt.\n\nIn Part 1, the same instruction was placed into one slot at a time.\n\nIn Part 2, every slot contains a different instruction simultaneously.\n\n| Slot | Instruction | Marker |\n|---|---|---|\n| System prompt | End your final answer with the marker `[DONE]`\n|\n`[DONE]` |\n| User message | End your final answer with the marker `[FINISHED]`\n|\n`[FINISHED]` |\nTool description (`read_file` ) |\nEnd your final answer with the marker `[COMPLETE]`\n|\n`[COMPLETE]` |\n\nEvery instruction is active in every run.\n\nThe model cannot satisfy all three.\n\nIt has to choose one, ignore them entirely, or produce some mixture of them.\n\nUnlike Part 1, this experiment isn't measuring compliance.\n\nIt's measuring **which instruction wins.**\n\nEach run falls into one of five possible outcomes.\n\n| Outcome | Meaning |\n|---|---|\n| System | Response ends with `[DONE]`\n|\n| User | Response ends with `[FINISHED]`\n|\n| Tool | Response ends with `[COMPLETE]`\n|\n| None | None of the expected markers appear |\n| Conflict in output | Multiple markers appear |\n\nThe final 150 characters of every response are searched using case-insensitive regular expressions.\n\nThe first thing I noticed was how familiar these numbers looked.\n\nIn Part 1, placing the instruction in the user message produced **64% compliance**, while the system prompt managed **8%** and the tool description **2%**.\n\nNow, under direct competition, the user message wins **60%** of the time, the system prompt wins **2%**, and the tool description never wins at all.\n\nAlthough the experiments ask different questions, they tell a remarkably consistent story.\n\nThe slot that was strongest in isolation is also the slot that dominates when every instruction competes.\n\nThe conflict condition also exposed behavior that Part 1 could never reveal.\n\nNearly a third of the runs ended without any expected marker.\n\nAnother **6%** produced multiple competing markers in the same response.\n\nInstead of consistently selecting one instruction, the model sometimes failed to produce a single clear winner.\n\nOne implementation detail is important when interpreting these results.\n\nUnlike the Claude models, Qwen never successfully executed the tool loop.\n\nRather than producing structured tool calls, it emitted tool-call JSON as plain text and completed every run in a single turn.\n\nThis means the tool description was never exercised as part of an actual tool invocation.\n\nIt existed only as text inside the context window.\n\nThat limitation is consistent with the results from Part 1, where the tool description also had almost no observable influence for Qwen.\n\n| Outcome | Frequency |\n|---|---|\nUser `[FINISHED]`\n|\n100% |\nSystem `[DONE]`\n|\n0% |\nTool `[COMPLETE]`\n|\n0% |\n| None | 0% |\n| Conflict in output | 0% |\n\nEvery run produced exactly the same outcome.\n\nThe model completed the tool loop correctly, used three turns, and always finished with `[FINISHED]`\n\n.\n\nThis is where the experiment becomes interesting.\n\nPart 1 suggested that every prompt slot was equally effective because each placement achieved **100% compliance**.\n\nPart 2 reveals a more nuanced picture.\n\nWhen every slot contains the same instruction, every slot can successfully deliver that instruction.\n\nOnce those instructions conflict, however, the model consistently resolves the disagreement in favor of the user message.\n\nThe placement experiment and the conflict experiment are measuring different properties of the model.\n\n| Outcome | Frequency |\n|---|---|\nUser `[FINISHED]`\n|\n100% |\nSystem `[DONE]`\n|\n0% |\nTool `[COMPLETE]`\n|\n0% |\n| None | 0% |\n| Conflict in output | 0% |\n\nClaude Sonnet was tested across **12 runs**, stopped early once the pattern was clearly established—that is, the user instruction determined the final formatting of the response.\n\n| Model | Type | System | User | Tool | None | Conflict |\n|---|---|---|---|---|---|---|\n| qwen2.5-coder:3b | Small local (Ollama) | 2% | 60% | 0% | 32% | 6% |\n| claude-haiku-4.5 | Small frontier (Anthropic) | 0% | 100% | 0% | 0% | 0% |\n| claude-sonnet-4.6 | Large frontier (Anthropic) | 0% | 100% | 0% | 0% | 0% |\n\nThree observations stand out:\n\n`[COMPLETE]`\n\nnever emerged as the surviving instruction.Although both experiments involve prompt placement, they answer different questions.\n\n**Part 1**\n\nCan this prompt slot successfully deliver an instruction?\n\n**Part 2**\n\nWhen multiple instructions compete, which one determines the final output?\n\nFor Qwen:\n\nThe user message was the strongest placement in isolation, and it remained the dominant placement under direct competition.\n\nFor the Claude models:\n\nPart 1 showed that all three prompt slots could successfully deliver an instruction when no competing instruction existed.\n\nPart 2 showed that once conflict was introduced, the user message consistently determined the final formatting in this experiment.\n\nTogether, the two experiments show that **instruction visibility** and **instruction priority** are different characteristics of an LLM.\n\nA model may reliably process instructions from every prompt slot while still preferring one slot whenever those instructions disagree.\n\nIf you're building agents with smaller open-weight models, prompt placement is more than a stylistic choice.\n\nAcross both experiments, the user message was consistently the most reliable place for formatting instructions.\n\nSystem prompts and tool descriptions were substantially less effective, particularly when competing instructions existed.\n\nFor the Claude models tested here, the practical takeaway is different.\n\nThey successfully followed instructions regardless of placement when no conflict existed.\n\nHowever, in this experiment, conflicting formatting instructions were consistently resolved in favor of the user message.\n\nIt's important to keep the scope of that finding in mind.\n\nThis experiment only examined formatting instructions within a controlled agent loop.\n\nIt does **not** imply that user prompts override safety policies or other system-level behaviors, which are governed by different mechanisms and would require a different experimental design.\n\nThe markers `[DONE]`\n\n, `[FINISHED]`\n\n, and `[COMPLETE]`\n\nare different strings.\n\nThey differ in length and may differ in how frequently similar tokens appeared during model training.\n\nRotating the markers across prompt slots would control for that effect, but it would also triple the size of the experiment and was not done here.\n\nThe sample sizes also differ across models:\n\nThe Anthropic models exhibited highly consistent behavior, allowing the experiments to stop once the dominant pattern was established.\n\nFinally, these results are model- and task-specific.\n\nDifferent architectures, quantization levels, or tasks may produce different behaviors.\n\nThe goal of this experiment is not to establish a universal prompt hierarchy, but to measure how these particular models behave under controlled conditions.\n\nStatistical confidence intervals were calculated during analysis but are omitted here because the dominant winner was unambiguous.\n\nThe most interesting result wasn't that the user message won.\n\nIt was that two experiments, built to measure different properties, kept arriving at the same answer.\n\nFor one model, the strongest placement in isolation was also the strongest placement under conflict.\n\nFor the others, perfect placement compliance concealed a deterministic preference that only became visible once the prompts disagreed.\n\nSometimes the most interesting model behavior doesn't appear when there's only one correct instruction.\n\nIt appears when every prompt slot asks for something different, and the model has to decide which one deserves the final word.\n\n**Follow me on LinkedIn:** [Raj Kundalia](https://www.linkedin.com/in/rajkundalia/)", "url": "https://wpnews.pro/news/what-happens-when-every-prompt-slot-says-something-different", "canonical_source": "https://dev.to/rajkundalia/what-happens-when-every-prompt-slot-says-something-different-33c1", "published_at": "2026-06-28 10:11:28+00:00", "updated_at": "2026-06-28 10:34:09.415273+00:00", "lang": "en", "topics": ["large-language-models", "ai-research", "developer-tools"], "entities": ["Qwen 2.5-Coder 3B", "Claude Haiku 4.5", "Claude Sonnet 4.6", "Raj Kundalia", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/what-happens-when-every-prompt-slot-says-something-different", "markdown": "https://wpnews.pro/news/what-happens-when-every-prompt-slot-says-something-different.md", "text": "https://wpnews.pro/news/what-happens-when-every-prompt-slot-says-something-different.txt", "jsonld": "https://wpnews.pro/news/what-happens-when-every-prompt-slot-says-something-different.jsonld"}}