cd /news/large-language-models/harness-engineering-101-prompt-engin… · home topics large-language-models article
[ARTICLE · art-32329] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Harness Engineering 101: Prompt Engineering wasn't enough. Neither was context. The harness was.

A developer describes the evolution from prompt engineering to context engineering and finally to building a 'harness'—a persistent structure of files, memory, and tool access that enables AI models to work effectively with personal and project-specific knowledge. The harness solves the cold-start problem by maintaining context across sessions through root and project-level configuration files, a growing memory of markdown files, and MCP connectors to real data sources.

read9 min views1 publishedJun 18, 2026

TL;DR

Agent = Model + Harness

). Two camps now argue about it and miss that they agree: the edge moved out of the model and into the structure around it.A harness is everything you build around a model so it can do real work in your world: your files, your accounts, your standards, your history. The model is the swappable part. The harness is the part that makes the model useful, and it is also the part nobody screenshots.

About a year and a half ago, my whole relationship with these models was prompt engineering. I collected phrasings that worked. "Act as a senior React Native engineer." "Think step by step." "Return only the diff." I had a notes file of magic openers. When an output was bad, my first instinct was that I had said it wrong.

If you remember the wave of new AI influencers back then ("steal these prompts," "the prompt that killed marketing," and so on), the whole premise was that a better prompt was the fix. That works until it doesn't. The problem with prompt engineering is that the model still knows nothing about you. A perfect prompt produces a good answer to a generic question. I was not asking generic questions. I was asking about my codebase, my locked decisions, my half-built product. The prompt was clean and the answer was still confidently wrong, because the model had no idea what Morrow Self was or that the accent color was already decided.

So I moved to context engineering, which is the obvious next step. Stop tuning the words, start assembling the right context window. Paste the relevant file. Paste the conventions. (I wrote a piece a while back on context engineering. [→ link to add])

Paste yesterday's decision. The answers got dramatically better. Then I hit the wall, and the wall was me.

I was the context. Every morning I sat there hand-assembling the same window: who I am, what I am building, what is locked, what shipped yesterday. I was a human glue layer copying my own life into a text box, and the moment the session ended, all of it evaporated. Context engineering made the model smarter per session and did nothing about the fact that every session started cold.

That cold start is the actual problem. Not the wording. Not even the context itself. The fact that none of it persisted, so I rebuilt it by hand every day.

One of the first AI use cases that pulled me in was the second brain approach. I started early, and I will say one thing: it is amazing. I would recommend a second brain to anyone. No AI use case is better, for me personally. I have a whole guide to help you get started: the second brain starter guide.

I did not solve the cold-start problem on purpose. I solved it one annoyance at a time, and only later found out the pile of fixes had a shape.

It started with a single file. A root CLAUDE.md

that tells the model who I am: nine years in React Native, how I write, what I am launching, which decisions are locked and not up for debate. Then a CLAUDE.md

per project, so inside the Wire RN repo it knows that codebase's rules, and inside my vault it knows the content rules. The model stopped starting cold. It started as someone who had worked with me for months.

Then memory. Nearly a hundred markdown files now (97 the morning I counted), one fact each, with an index file the model reads at the top of every session. For example:

The index is now big enough that it trips its own size limit, which tells you something honest about how this accretes. I do not re-explain my own business every morning anymore. It remembers, and when it is wrong, I fix one file instead of repeating myself for the hundredth time.

Then access. MCP connectors into Gmail, Calendar, Drive. The model reads my actual schedule and my actual inbox, not a sentence describing them. Context engineering was me narrating my calendar. This is the model just having the calendar.

Then delegation, which is where my one real rule lives. When I need ten files grepped or a codebase mapped, that runs in a separate context and hands back the conclusion. This is the same principle I run on every build: the newest, most capable model plans, a cheaper and simpler one executes. The expensive brain decides what to do. The cheap one does the grunt work in its own window and never pollutes mine.

On top of all of it sit the skills. Sixty-some of them: write a LinkedIn post in my voice, draft a newsletter, run the daily plan, plus scheduled jobs that fire without me sitting there.

None of that was clever. Every piece exists because I got tired of repeating myself. That is the whole thing, and it is much more boring than the posts about it sound.

Three months ago, someone replied to one of my posts to explain harness engineering to me. Kindly. Like I had never heard of it. He linked a newsletter, told me an agent is only as good as the scaffolding around it, and signed off with "wild stuff, right?"

It was wild. I had been doing it since December. I just did not have the word.

Here is the word. Mitchell Hashimoto coined "harness engineering" in early 2026: Agent = Model + Harness

. The model is the brain. The harness is everything around it that lets the brain act in your world. People break the harness into roughly five parts:

I mapped my own setup against that list expecting gaps and found I had quietly built all five.

The number everyone repeats comes from a teardown of Claude Code, where the claim is that something like 98% of the system is harness and under 2% is the model. I will be honest: I have not verified what that figure actually counts, and most people reposting it have not either, so hold the exact number loosely. Directionally it matches what I see every day. The model is the small, swappable part. The scaffolding is where the work lives. (Martin Fowler's notes and HumanLayer's practitioner write-up are the two least hyped explainers I have read if you want the real version.)

While Hashimoto was naming the harness, another builder, Jake Van Clief, went the opposite direction and grew a community of tens of thousands in about six weeks, telling everyone to stop using agentic frameworks entirely. His pitch: delete LangChain, delete the orchestration libraries, replace all of it with numbered folders and markdown files. A folder and a model, he argues, beats a custom agent. Big shoutout to Jake. I love the guy, I follow him, and the advice and content are genuinely good. Highly recommend you follow him too: youtube.com/@JEVanClief.

So one camp says build more scaffolding and the other says tear the framework out and use the filesystem. They sound like enemies. They are saying the same thing.

Both are telling you the model is not the point. The architecture around the model is the point. Whether your architecture is a LangChain graph or a folder named 02-draft

, the bet is identical: the edge moved out of the model and into the structure you wrap around it.

That is the thing I had been saying for six months before I had either of their vocabularies. I wrote a piece called "I spent 6 months on architecture, then redesigned everything in 2 hours". The redesign was fast because the harness was already there. The harness debate is the same argument in a newer hoodie. It blew up because two people gave a clean name to something a lot of us had already half-built and could suddenly point at.

Here is the part the harness posts leave out. A harness is not a one-time build. It is maintenance, and the maintenance is the actual job.

Memory files rot. Mine contradict each other if I do not prune them. A good chunk of my files were one launch date out of sync within a month of being written. A stale memory is worse than no memory, because the model trusts it and so do you. People who run bigger memory systems than mine clear them out on a schedule, quarterly, and I now understand exactly why.

Skills rot the same way. I have sixty installed. In a normal week maybe twelve fire. The other forty-eight are clutter I keep meaning to audit. A harness left untended does not stay neutral. It quietly fills with confident lies about your own life.

So when someone tells you the harness is the new moat, the honest version is that the harness is the new gym membership. Owning it does nothing. Showing up to maintain it is the entire return.

If you are starting from zero, you do not need a framework, a course, or a community of thirty thousand people. You need three things: CLAUDE.md

(or its equivalent) that tells the model who you are and what is locked.That is a harness. Everything past that is refinement, not foundation.

If you already have one, do not build more. Audit. Open your own memory files and count how many are still true. Count how many of your skills actually fired this week. The number will be humbling, and the prune will make the whole thing run better than any new addition would. Here is the gap I keep staring at. Everyone writing about this is pointing it at their own desk. Coding agents. Research assistants. Second brains like mine. Dev tools and knowledge work.

Nobody is building a harness for shipping a consumer mobile app.

That is the unclaimed corner, and it is the one I am standing in. The same idea, a model wrapped in structure it can trust, is what lets a mobile app render a different onboarding flow per user instead of the same six hard-coded questions for everyone.

The harness for a product is not a CLAUDE.md

. It is a validated component registry, a streaming runtime that survives on a real phone, and an agent that drives screens instead of chat. That is what I have spent six months building into Wire RN, and it is the same lesson as the second brain, pointed at a different surface.

That is also what I am shipping this week. Wire RN hits Product Hunt in a few days, and the next issue takes this exact harness idea and points it at an actual app, with the runtime and the component registry on screen instead of in theory.

The people naming the harness are right. They are just looking at their own desk. The more interesting move is what happens when you put the harness inside the thing you ship.

Now there is a shift toward Loop engineering. I already started playing with it, but as always, I want to test things first before I write a generic, AI-generated article about a new concept.

I write a weekly issue on building AI-native software, mostly on mobile, mostly with receipts like these. If the cold-start problem in this piece sounds familiar, the next one shows the harness running inside a real app. codemeetai.substack.com.

── more in #large-language-models 4 stories · sorted by recency
── more on @react native 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/harness-engineering-…] indexed:0 read:9min 2026-06-18 ·