# Five Stages of Building with LLMs

> Source: <https://llmfn.com/signals/fivestages>
> Published: 2026-06-03 16:32:20+00:00

### Building demo LLM product is ... easy

We all know this team. Strong engineers. CEO/Product owner is pushing them to ship more and more AI features. They decide: this is *it*. They build an LLM feature over the weekend. The demo is beautiful. They show this to an existing customer. The customer pastes a messy document, clicks a button, and the product produces a clean output. Someone asks a question in plain English, and the product answers like a colleague who has read everything.

We can now hear the silence. Until it is broken by the CEO, who says, "Let's go all-in and ship this."

### Building a production-ready product is... not easy!

The first ten sessions work. The next ten are mixed. Then a customer uploads a PDF with tables split across pages. Another asks a question that needs data from last month and this morning. Someone else expects the system to remember what they did yesterday. The answer is kinda *right*, but not quite. The tone is confident. The mistake is subtle. Customers notice this. Slack is filled with .... stinkers!

The team is now changing the prompt. For the moment, it gets better.

*Then it breaks somewhere else.*

They switch the models. They add more examples. They make the prompt longer. They try the new fancy framework. Then comes a vector database. And already couple of them are learning fine-tuning.

At this point the team is no longer building a product feature. They are *debugging a fog*. I am sure you all would've either faced it or seen this story play out many times in your circles.

Well, LLMs are very useful and usable.

**But "building with LLMs" has 5 different stages !**.

Each stage/way of working is targeted towards a specific kind of problem and team (they both go hand-in-hand). They have their own strengths and failure modes. The engineering discipline it requires to pull off drastically changes. Here's an overview of them.

## 5 Stages of building a product using LLM

### Stage 0: LLM as Chat

This is where all of us begin.

You open ChatGPT or Claude in a browser. You ask it to rewrite an email. You paste in a rough plan and ask for gaps. You ask it to explain an error message. You use it to brainstorm product names, summarize a call, draft a proposal, or understand a new concept.

There is no product yet. Just a human and an LLM, taking turns. And primarily used for personal productivity.

While this stage is easy to underestimate because it feels informal, it has drastically changed how people work. A founder uses it to shape a landing page before talking to a designer. A lawyer uses it to turn dense notes into a first draft. A product manager uses it to compare customer interview themes.

The key actor here is the human: the human is still the system of record. The human checks the answer, edits the output, decides what matters, and carries the context from one task to another. And hence, disproportionately, the main skill that makes one succeed here is **judgement**.

Can you ask a better question? Can you notice when the answer sounds plausible but weak? Can you bring enough context into the conversation? Can you use the model as a thinking partner without outsourcing your thinking?

But then, remember that it is not yet "software". Not in the way the field has changed.

### Stage 1: LLM for Software

The next stage begins when the LLM moves closer to the codebase. You are changing how the product gets built. But the product for the end user isn't changing.

You could be using Cursor to modify three source code files at once. Or you could be using Claude Code to explain why a test is failing. The LLM writes the code, plans the migration, drafts unit tests, converts a React component, or reviews a pull request for (obvious) mistakes.

The customer may never know this happened. The button looks the same. The dashboard behaves the same. The API returns the same shape.

But the development process is different. The engineer is now turbo-charged with a highly capable intern who could do many of the development stuff. The engineer has now become a manager and orchestrator.

At Stage 1, LLMs act like a new layer in the engineering workflow. They reduce blank-page friction. They make unfamiliar code less intimidating. They turn "I need to understand this module" from a two-hour reading task into a guided tour.

But this stage also has a trap. Because the LLM can produce code quickly, it can also produce confusion quickly. It may copy a pattern from the wrong part of the codebase. It may invent an API that does not exist. It may solve the local problem while quietly breaking the architecture.

So the core skill changes from prompting to reviewing.

Can you tell when the generated code matches the local conventions? Can you run the right tests? Can you keep ownership of the design? Can you use the LLM to accelerate engineering without letting it dissolve engineering discipline?

Stage 1 makes teams faster. It does not automatically make their products intelligent.

### Stage 2: LLM Augmented Software

Stage 2 is where most product teams first try to "add AI" (the cooler term I keep hearing these days is: "*infuse AI*"!.

The existing product remains mostly traditional software. There are users, accounts, permissions, workflows, databases, settings, and reports. The LLM is added as one component inside that system.

A support product adds an AI reply suggestion. A CRM adds account summaries. A document tool adds "summarize this contract". An analytics product adds a natural-language query box. An internal dashboard lets the operations team ask, "Which orders are at risk today?"

Product shaping is super important in this stage. The value has to be clear. The scope is typically well-contained. The LLM does not run the whole product. It only has to improve one workflow.

But Stage 2 is also where teams discover that a demo and a product feature are different objects.

**In a demo, you control the input. In production, the input controls you.**

Users paste messy text. Documents have missing sections. Search results include stale information. The model returns a paragraph when the UI needs JSON. The answer is useful, but too long. The answer is short, but misses the important caveat. The model says it found something, but cannot show where it came from.

This is when the team realizes the prompt is only one layer. To build a reliable and predictable product, you need lot more in your arsenal.

You need **retrieval** so the model has the right information. You need **schemas** so the output can be used by the application, in a predictable manner. You need **guardrails** for what the system should refuse. You need **fallbacks** for when confidence is low. You need **logs**, examples, and **evals** so you can tell whether changes improve the feature or just move the errors around.

At Stage 2, the key question is: How do we make an LLM useful inside an existing product workflow?

That question is narrower than "How do we build an AI company?" and more serious than "Can we add a chatbot?"

It is the stage where AI becomes a product capability.

### Stage 3: LLM Integrated Application

While Stage 2 is adding a LLM capability in the product, Stage 3 takes it to an entirely new level. Here LLM is the engine of the product. In Stage 2, the product works perfectly and is useful even without the LLM. On the other hand, in Stage 3, without LLM, the product makes absolutely no sense!

Think of a coding agent that plans changes, edits files, runs tests, and iterates. Yup - Cursor and Claude Code are examples - but here the product is Cursor/Claude Code. In Stage 1, we used the product to do something useful for *our product*. Here that product is the one we are building!

More examples: Think of an AI tutor that adapts to a student's mistakes over weeks. Think of an analyst that investigates a business question by reading dashboards, writing queries, checking assumptions, and explaining the result. Think of a research assistant that does not just summarize sources, but decides what to read next.

At this stage, the product is managing work, not just generating text.

At this stage, the problems become **architectural**. The system needs **context**, but maybe not all context. It needs **memory**, but not every past interaction. It needs **tools**, but must know when to use them. It needs **state**, because a long-running task has steps, partial results, and decisions. It needs **evaluation**, because "the answer looks good" is not a production metric.

This is the stage where many teams get stuck. Getting this right will give rise to a new class of products and reach users who were earlier not served well in the earlier software era. A useful Stage 3 product often feels simple to the user. "Find the issue." "Prepare the brief." "Help me study this topic." "Analyze churn for last quarter."

Behind that simplicity is a system that has to decide what to search, what to remember, what to ask, what to ignore, what tool to call, how to recover from a bad step, and how to explain what happened.

**At Stage 3, you are designing behavior.**

### Stage 4: Domain Specialization

Stage 4 is where generic capability stops being enough. The product now lives in a domain where the details matter deeply. Healthcare. Law. Finance. Manufacturing. Insurance. Scientific research. Cybersecurity. Education at scale.

In these domains, a mostly-correct answer can be worse than no answer. The system needs to understand specialized language, domain constraints, regulatory expectations, edge cases, and evaluation standards.

This does not always mean fine-tuning. Sometimes the right answer is better retrieval. Sometimes it is a carefully designed workflow. But many times it is a smaller model trained for a narrow classification task. It will require a highly specialized domain-specific eval set built from real cases. It will require human interaction at the right point in the loop.

At this stage, the product can no longer rely only on the broad competence of a frontier model. Imagine a clinical workflow where the model helps prepare a radiology report in an Oncology scan. The language is specialized. The cost of omission is high. The evaluation cannot be, "Does this sound reasonable?" It has to be measured against expert judgement, known cases, and the actual workflow of radiologists, patient care team and surgeons. It is domain reasoning over messy material, with traceability.

In this stage, data quality, domain evaluation, expert feedback, and workflow integration become central.

Unfortunately, it is also the stage many teams talk about too early. They say, "We need to fine-tune our own model," when they have not yet built reliable retrieval. They worry about domain adaptation when their product cannot preserve state. They compare themselves to specialised AI labs when they are still trying to ship their first LLM feature inside an existing app.

Stage 4 is real. But almost always, at least at the current state, it is just not always the best next step.

## The stage mismatch and the path forward

As teams truly become cross-functional (in terms of doing work itself), knowing what stage you are in helps in better communication.

Do not fail because you have misplaced expectations.

A Stage 2 team reads about Stage 4 and decides it needs fine-tuning. But maybe the real problem is that the model is missing context. A Stage 3 team treats its product like Stage 2 and keeps adding isolated features, when what it needs is orchestration, state, memory, tools, and evals. A Stage 1 team celebrates velocity, but never builds the review habits that keep AI-written code maintainable.

Each stage has a different question.

- Stage 0 asks: how do I think better with an LLM?
- Stage 1 asks: how do we build software faster with LLMs?
- Stage 2 asks: how do we add LLM capability into an existing product?
- Stage 3 asks: how do we build a product where the LLM is the core engine?
- Stage 4 asks: how do we specialize the system for a domain where generic answers are not enough?

These questions build on each other. And they are not interchangeable.

Across the entire organization - starting with the CXOs, product team, data team, ops, engineering team - all of them need to understand what stage the company's product is and what is that they are working towards.

Structure customer conversations to get useful feedback. Not all failures are a model problem. Most times, the failure happens inspite of a frontier model.

Enable the team (and the entire Org) to ask better questions:

- What information did the system need?
- What format did the product expect?
- What state was missing?
- What tool should have been called?
- What would a good answer look like?
- How would we know if the system improved?

Shift your thinking from playing with LLMs to building with them.

If you're building at Stage 2 or 3, adding LLM features or making the LLM your core product, then we are running a one-day workshop in Bengaluru on 13 June.

We will build one product together, layer by layer:

**Search → Prompt → Schema → Context → State → Memory → Tools → Evals**

Each layer fixes a real failure.

One day. One product. Eight layers.

· 13 June · Bengaluru · ₹4,999 · 30 seats

If interested, book your tickets here [llmfn.com](https://llmfn.com)