{"slug": "a-prompt-is-not-a-conversation-it-s-a-component-contract", "title": "A prompt is not a conversation. It's a component contract.", "summary": "A developer has introduced a structured approach to prompt engineering for large language models, arguing that a prompt should be treated as a \"component contract\" rather than a conversation. The framework breaks down effective prompts into four building blocks—role, instruction, context, input data, and output indicator—and organizes best practices into four dimensions: clarity, context, precision, and role-play. The developer emphasizes that LLM outputs have two distinct audiences—a parser and a person—each requiring a specific contract to avoid either loud failures (code crashes) or quiet failures (subtly off-target answers).", "body_md": "Most of us use LLMs by trial and error. This post gives you a structure: the building blocks of an LLM, and a reusable template for writing production prompts.\n\nFoundation models are very large models pretrained on internet-data; that's what builds Generative AI. With a foundation model, you can adapt one pretrained model to many tasks.\n\nA Large Language Model (LLM) is a foundation model for text, and at its core, the same FM can be used for many tasks: summarisation, classification, translation, code generation.\n\nSo what an LLM does is predict the next word (next token) in a sequence. At each step it checks the surrounding context of what it has seen so far and then produces a probability distribution over the possible next tokens. Running this in a loop produces fluent, coherent new content. So basically, an LLM gets text and returns text. The input is called the prompt and the output is called completion.\n\nA prompt is any input given to a generative model to produce a desired output. Prompt engineering is the practice of designing and refining those prompts to get the best possible results from the model. Refining a prompt means experimenting with the factors that influence the model's output. A vague prompt can lead to many reasonable responses; every constraint you add reduces the number of possible responses.\n\nA well-structured prompt is usually assembled from four parts.\n\nBuilding block |\nRole |\n|---|---|\n| Instruction | The task you want performed. |\n| Context | Relevant background that frames the situation for the model. |\n| Input data | The specific content the task should operate on. |\n| Output indicator | A description (or example) of the form the response should take. |\n\nThe instruction and context here are two of the template slots, **Task** and **Context**, we'll pull together at the end.\n\nAnd best practice for writing prompts can be organised into four dimensions. Strong prompts attend to all four.\n\nDimension |\nPractice |\n|---|---|\n| Clarity | Use simple, direct language. Avoid ambiguous or overly complex terminology so the prompt is easily understood. |\n| Context | Provide relevant background and specific details to guide the model's understanding of the situation. |\n| Precision | Clearly state the type of response you want, and use examples to illustrate the expected output. |\n| Role-play / Persona | Write the prompt from the perspective of a specific character or expert, with enough detail for the model to assume that role effectively. |\n\nThink of a prompt like a search beam. Vague = wide beam, the model lands somewhere in a large valid region. Each constraint narrows the beam. Specificity isn't politeness toward the model; it's aiming.\n\nThe output is the more interesting part for us. An LLM's output has two possible audiences, a parser and a person, and you write a contract for each. Even when a person reads the output, \"looks fine\" isn't the same as \"matches what I needed.\" When an LLM's output is read by *code* instead of *eyes*, the output is an API response and the prompt is its schema. A human forgives a messy answer; `json.loads()`\n\ndoesn't. It either succeeds or throws. Without an explicit spec, the *model* decides format, length, tone, and depth, and it picks something plausible but generic. Controlling output here means moving that decision from the model to you.\n\n**Two kinds of control:**\n\n**Each audience fails differently, and one of them fails silently:**\n\nParser (output read by code) |\nPerson (output read by a human) |\n|\n|---|---|---|\n| What breaks | Markdown fences, chatty preamble, invented/renamed keys, numbers as strings | Too long, wrong tone, missing a section, pitched at the wrong level |\n| How it breaks |\nLoud: `json.loads()` throws, the pipeline stops; you notice immediately |\nQuiet: output looks fluent and complete; the gap only shows on a second read, and nobody flags it |\n\nA loud failure is annoying; a quiet failure is more dangerous because a subtly off-target answer can go unnoticed.\n\nThere are some mitigations you can apply for better results:\n\n*For the parser:*\n\n`temperature: 0`\n\n.*For the person:*\n\nSo the point here is that prompt-only format control is a request; decode-time constraints are a guarantee. Treat the model's output as an API response, and the prompt as its schema.\n\n**Constraints** and **Output** are template slots too: the rules that prune behavior, and the exact shape you contract for.\n\nWe already saw the role as one of the four dimensions of an effective prompt. It's the part lots of people forget about.\n\nA prompt can include three types of messages: system, user, and assistant.\n\nThe \"type\" is usually set through the `role`\n\nfield of each message.\n\nThe system message is the component's configuration, while the user message is the input for a specific call.\n\nThe system prompt defines persistent behavior and rules. The user message provides the variable data for that particular request. The system sets how the component behaves in general; the user message tells it what to do right now.\n\nWhy do roles work? It's not magic.\n\nWhen you say, \"Act as a senior security engineer,\" the model shifts its output toward patterns statistically associated with that kind of writing in its training data.\n\nLikewise, \"Explain this to a junior developer\" pushes the model toward simpler, more educational, and more heavily explained responses.\n\nA role doesn't give the model a real personality. It changes the probability distribution of the kind of text the model is likely to generate. This is the **Role** slot, the first line of the template.\n\n**Why the system/user split matters in production:**\n\nOne good prompt gets you far. A few techniques get you further. The most useful one: showing the model examples.\n\nIn few-shot prompting, the prompt includes a few worked demonstrations. The model uses in-context learning to infer the pattern and apply it to the new input.\n\nThe **few-shot examples are unit tests that double as a spec.**\n\n**Why it works**: the model is a pattern continuator. A few input→output pairs establish a strong, low-ambiguity pattern, and the model continues it.\n\n**When to use which:**\n\n**Zero-shot:** simple, common tasks the model has clearly seen many times.\n\n```\n[\n  {\n    \"role\": \"user\",\n    \"content\": \"Summarize this article in 3 bullet points.\"\n  }\n]\n```\n\n**One-shot:** when you mainly need to pin the *format*.\n\n```\n[\n  {\n    \"role\": \"user\",\n    \"content\": \"Convert countries to JSON.\\n\\nExample:\\nFrance -> {\\\"country\\\": \\\"France\\\"}\\n\\nNow convert:\\nBrazil\"\n  }\n]\n```\n\n**Few-shot:** classification, structured output, code style, anything with a specific schema or edge cases the model wouldn't guess.\n\nThose worked demonstrations are the **Examples** slot.\n\nIn the example below, the last message is the new input; the model produces the `assistant`\n\nturn.\n\n```\n[\n   {\"role\": \"user\",      \"content\": \"Today the weather is fantastic\"},\n   {\"role\": \"assistant\", \"content\": \"positive\"},\n   {\"role\": \"user\",      \"content\": \"I don't like your attitude\"},\n   {\"role\": \"assistant\", \"content\": \"negative\"},\n   {\"role\": \"user\",      \"content\": \"That shot selection was awful\"},\n]\n```\n\nLLMs are not free, and the token is the meter. For an LLM, it's the unit of both cost and latency. For most of us the mental model is familiar: tokens are network payload, and each LLM call is a billable, latency-bearing API request.\n\nOutput tokens dominate latency. The model generates text one token at a time, and each new token depends on all the previous ones, so generation has to happen sequentially.\n\nInput works differently: the prompt is processed in a single parallel \"prefill\" pass, which is relatively fast (even though you still pay for those tokens).\n\nFour levers do most of the work:\n\n**Compress the input**\n\nInstead of sending entire documents, summarize them first or extract only the relevant parts. Most prompts ship context the model never reads.\n\n**Limit and structure the output**\n\nSet `max_tokens`\n\n, prefer structured formats like JSON or arrays instead of long prose, and ask for concise summaries such as \"keep under 120 tokens.\"\n\n**Use the right model for the task**\n\nSmall models are faster and cheaper for simple work like classification, routing, or extraction. Save the stronger models for tasks that actually require reasoning or synthesis.\n\n**Use caching**\n\nThere are two different kinds:\n\n**Prompt/prefix caching**\n\nReuses stable prompt sections like system prompts, examples, or reference documents. Since the provider caches this server-side, you avoid recomputing the expensive input processing step.\n\nPractical implication: put stable content first and variable content last to maximize cache hits.\n\n**Response/semantic caching**\n\nYour own infrastructure stores previous answers and reuses them when the same or a very similar request appears again. This caches outputs, not prompts.\n\nAll that we've seen here gives us the structure to build a template to use when working on a production prompt.\n\nThe template **R-T-C-C-E-O**:\n\n```\n[ROLE]        Who the model acts as. Sets the output distribution.\n[TASK]        The one thing to do, stated unambiguously.\n[CONTEXT]     Inputs, background, data; clearly delimited from instructions.\n[CONSTRAINTS] Rules that prune the space. Each maps to a real failure mode.\n[EXAMPLES]    1–3 representative input→output pairs (for structured/edge-case tasks).\n[OUTPUT]      The exact shape of the response. Schema + example if machine-consumed.\n```\n\nNot every prompt needs all six, but every prompt should be a *deliberate* subset, not an accident.\n\nFinally, let's make the production checklist, a pre-flight pass before a prompt ships:\n\n`temperature`\n\nmatched to the task?`max_tokens`\n\ncapped; output format compact?So we covered six slots: Role, Task, Context, Constraints, Examples, Output, plus a pre-flight checklist. Run every production prompt through both, and you've turned a hopeful request into a tested component. **A prompt is not a conversation. It's a component contract.**", "url": "https://wpnews.pro/news/a-prompt-is-not-a-conversation-it-s-a-component-contract", "canonical_source": "https://dev.to/csalda3a/a-prompt-is-not-a-conversation-its-a-component-contract-4jk8", "published_at": "2026-05-25 21:48:25+00:00", "updated_at": "2026-05-25 22:03:23.155199+00:00", "lang": "en", "topics": ["large-language-models", "generative-ai", "natural-language-processing", "artificial-intelligence", "machine-learning"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/a-prompt-is-not-a-conversation-it-s-a-component-contract", "markdown": "https://wpnews.pro/news/a-prompt-is-not-a-conversation-it-s-a-component-contract.md", "text": "https://wpnews.pro/news/a-prompt-is-not-a-conversation-it-s-a-component-contract.txt", "jsonld": "https://wpnews.pro/news/a-prompt-is-not-a-conversation-it-s-a-component-contract.jsonld"}}