Your Agent Can Learn at Three Layers — Most Teams Only Think About One

LangChain co-founder Harrison Chase argues that AI agents can learn at three distinct layers—model, harness, and context—but most teams focus only on retraining the model. The harness layer, which optimizes scaffolding using run history, and the context layer, which includes memory and instructions, offer more practical and flexible opportunities for continual learning in production systems.

When someone says “the AI needs to keep learning,” almost everyone pictures the same thing: retraining the model. New data, updated weights, a fresh checkpoint. But for agents , that instinct quietly throws away two of the three places where learning actually happens — and they’re the two most teams can realistically use. This framing comes from Harrison Chase LangChain’s co-founder , and it’s one of those breakdowns that’s obvious in hindsight and clarifying once you have it. The claim: an agentic system can learn at three distinct layers — the model, the harness, and the context https://www.langchain.com/blog/continual-learning-for-ai-agents — and knowing which layer you’re operating on changes how you build systems that improve over time. This post walks through all three, what learning looks like at each, the trade-offs, and the thing that quietly powers all of them. If you build agents and “continual learning” still means “fine-tune the model” in your head, this will widen the picture. Strip an agentic system down and it has three parts that can change independently: The cleanest way to feel the distinction is to map a real agent onto it. Take Claude Code: the model is Claude Sonnet or whichever model you point it at , the harness is Claude Code itself, and the context is your CLAUDE.md, your /skills, your mcp.json. Same model, same harness, totally different behavior depending on the context you hand it. Map it onto OpenClaw https://docs.openclaw.ai/ and the same shape holds: many possible models, a harness built on Pi plus scaffolding, and context that lives in its SOUL.md and the skills it pulls from clawhub. The layers are universal even when the implementations differ wildly. Here’s the key move: most people jump straight to the model when they hear “learning.” In reality, the system can learn at all three — and the higher layers are often where the practical wins live. This is the one everyone already pictures. You update the weights with supervised fine-tuning, or reinforcement learning like GRPO, so the model itself gets better at your task. It’s the most powerful lever — you’re changing the raw intelligence — but it comes with the heaviest baggage. The central hazard is catastrophic forgetting : train the model on new tasks and it tends to quietly degrade on things it used to do well. That’s still an open research problem, not a solved engineering step. In practice, model-layer learning is almost always done for the agent as a whole rather than per user. You could imagine a LoRA adapter per user, but almost nobody does — the cost and operational complexity don’t pencil out. For most teams shipping agents, this layer is something you reach for rarely, if ever. This is the layer most people don’t even know is a learning surface, and it’s the most interesting of the three. The harness is the scaffolding — the loop, the tools, the always-on system instructions. The insight is that this code can be optimized automatically using the agent’s own run history. The pattern, described well in the Meta-Harness work https://yoonholee.com/meta-harness/ , goes like this: Read that again, because it’s a genuinely different idea: a frozen model, with a coding agent rewriting the scaffolding around it to climb an eval score. No weight updates at all — the system gets better by improving its own plumbing. LangChain used exactly this pattern traces → coding agent → harness changes to improve its open-source Deep Agents harness on a terminal benchmark. Like model training, this is usually done at the agent level, though in principle you could learn a different harness per user. Context is everything that sits outside the harness and configures it — instructions, skills, and especially memory . And this is, pragmatically, where most agent “learning” happens in production right now. What makes this layer so useful is how flexibly it can be scoped. You can learn context at: And you can mix them — agent-level and user-level and org-level context updates riding on the same base model. There are two ways the updates actually happen, and the distinction matters: The beauty of this layer is that it’s cheap, scopable, and personalizable. One evolving context per company, per team, or per user — no GPU cluster required, no catastrophic forgetting to fear. No layer is strictly best; they trade off along predictable axes: ModelHarnessContext What changesWeightsScaffolding code/toolsInstructions, skills, memoryCost & difficultyHighestMediumLowestMain riskCatastrophic forgettingRegressions in agent codeStale/bloated memoryEasy to personalize?Rarely per-agent Usually per-agentYes — per agent/user/orgTypical cadenceInfrequent, offlinePeriodic, offlineReal-time or nightly The practical takeaway: getting to something that feels like continual learning will involve all three layers, but the higher you go, the cheaper and more scopable the learning gets. Most teams should be doing far more at the harness and context layers than they currently are. Here’s the unifying punchline. Every one of these loops — retraining the model, rewriting the harness, evolving the context — is powered by the same artifact: traces , the full execution record of what the agent actually did. That’s not a coincidence; it’s the deep reason agents are different from traditional software. In classic software, the code is the source of truth — read it and you know what will happen. With an agent, the logic lives in a non-deterministic model, so the only faithful record of what happened is the trace of a real run. Want to train the model? You need traces to learn from. Want to improve the harness? A coding agent reads the traces. Want to evolve context? You extract insights from traces. Which leads to the conclusion worth carrying out of all this: if traces are what every learning loop consumes, then capturing them well — and having a serious way to evaluate what’s “good” in them — isn’t a nice-to-have. It’s the precondition for your agent learning anything at all, at any layer. So before you reach for fine-tuning, ask the cheaper question first: am I even capturing the traces that would let me learn at the harness or context layer? For most teams, that’s where the next improvement is hiding. Harrison Chase’s original framing is on the LangChain blog ; the harness-optimization idea is detailed in the Meta-Harness writeup , and OpenClaw’s “dreaming” memory pass is documented here . If you’re running agents in production, I’m curious which layer you’ve actually shipped learning at — my bet is context, with harness as the most underused. Your Agent Can Learn at Three Layers — Most Teams Only Think About One https://pub.towardsai.net/your-agent-can-learn-at-three-layers-most-teams-only-think-about-one-f041b234bdf8 was originally published in Towards AI https://pub.towardsai.net on Medium, where people are continuing the conversation by highlighting and responding to this story.