# Presentation: From Hype to Strong Foundations: What the Rise, Fall and Resurgence of Agents Can Teach Us About Outlasting the Cycle

> Source: <https://www.infoq.com/presentations/llm-compound-ai-systems/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global>
> Published: 2026-06-17 11:04:00+00:00

## Transcript

**Aditya Kumarakrishnan:** I'm Aditya Kumarakrishnan. I'm a technical fellow at Walmart Global Tech. I'm here to talk to you about Agents: The Missing Manual. Part of this talk is an experience report. Part of it is theory. It's essentially about how we're thinking about fundamentals as we build agents. An interesting observation I've had over the last two years building agents is how looking back at ideas can feel like you're looking into the future. I'm here to talk to you about four big ideas that are actually from the past, but today they feel like they'd be something from the future. I'm a huge AI agent bull. The bull case is that I think agents are inevitable. What I mean by inevitable is that the conceptual idea of an agent is an inevitability. That doesn't mean that the realization of them is inevitable. What I find is that we're in the amnesia phase of building agents. We're building them on shaky foundations. We're making meandering progress. We're recommitting mistakes and relearning lessons from previous cycles of building agents.

## Layers of a Lasting Foundation

I want to spend most of the talk really telling you about four big ideas we should all be thinking about that I don't think enough people are talking about. One is that we need to be embracing a much stronger notion of agents than most of us embrace today. Two, we need to be building modular, extensible agents. Three, we need to learn from and leverage process science. Finally, we need to be Terraforming the environment that our agents act in. These four ideas, I don't think are getting enough play time. Most of the content out there about agents is mostly about context engineering, figuring out what I'm supposed to build, how to get agents to talk to each other through A2A. I think there are more foundational things we should be leaning on if we want to build agents that withstand the test of time.

Why do I think they're inevitable? They've always been at the core of the project of computing, way back from when Alan Turing conceptualized Turing machines, he really had agents in mind. Even from the early pioneers of AI, John McCarthy, and so on, agents were the core central project of AI. The story of AI, you know how they say there's only 12 different kinds of stories. There's tragedies. There's rags to riches. AI is one of peaks and valleys. With each peak comes renewed fervor in building agents. With each valley comes this catastrophic forgetting of all of the lessons learned in the previous peak. Like many other things in computing, agents are no stranger to catastrophic forgetting. The general tendency of the trends in computing over the last 50 years, have been towards more and more human orientation of systems. If you look at one trend in computing, and this kind of framing is inspired by Michael Wooldridge's fantastic "Introduction to MultiAgent Systems," that you all should check out.

One trend that's very apparent in computing is ubiquity. Computing devices are everywhere, and increasingly so, thanks to Moore's Law, thanks to the commodification of hardware, and so on. One paradigm is IoT devices that track this trend. The other is interconnection of computing devices that's becoming more and more the norm, again, because of the commodification of network devices and of hardware, and so on. Here you see the cloud emerge as a paradigm, microservices emerge as a paradigm. Then you see human orientation as another trend in computing systems. Here you see ideas like the semantic web emerge right at the intersection of interconnection, ubiquity, and human orientation. Then you've got delegation. We're delegating more and more tasks to computing systems. Here you've got process management and workflow systems that really shine at this trend of computing. At the intersection of delegation, human orientation and interconnection, you've got business process. Then you've got intelligence as another trend in computing systems. We're delegating more and more complex tasks to computing systems.

Here you see machine learning show up as a paradigm of intelligence. At the middle of all of this is this idea of an agent which brings together these five very apparent trends in where computing has been going over the last 50 years. It's that section of all of this, you delegate a task to an agent. It's, you talk to it in a human oriented way, you think of it as a proxy for a human. It's intelligent, of course. It's ubiquitous, everyone might have an agent one day, or many agents. It's interconnected, it needs to be able to talk to systems and other agents. It sits at the crux of all of this. This is why I think the conceptualism of an agent is inevitable, because these five things have been ongoing, and they'll continue to keep going, so they all converge on this idea of an agent. That's why people have been talking about it for so long, but it's incredibly complex to get right, so we're talking about it again.

## 1. Embrace a Stronger Notion of Agents

The first big idea I think we should all embrace is a stronger notion of agents. What I mean by that is, we should be liberating the agent from the LLM. We should be thinking about agents as a general-purpose computing abstraction. An abstraction that really puts the human at the center. You all have seen this picture many times before, it's actually lifted from the seminal work in AI. An agent perceives its environment and acts on it. The crucial thing here is an agent decides what action to take and when to take it. This is an interesting definition, because you could describe anything this way, as something that senses an environment and acts on it. This, again, comes from Russell & Norvig, the notion of an agent is simply a tool, is a modeling tool, not a characterization that divides the world into agents and non-agents. It's a way of understanding a system. It's like light is both a wave and a particle, it's a way of interpreting a phenomenon.

From Yoav Shoham, Agent Oriented Programming in the '90s, an agent, again, is an analysis tool, as much as it is an actual implementation. If you think about it this way, if the idea of an agent is so ubiquitous, why is it helpful? If you can model anything as an agent, what's the point? There's a funny quote in Agent Oriented Programming, where he says, you could even think of a light switch as an agent. Gemini did a fantastic job generating this picture of a light switch as an agent for me. It's perfectly coherent, he says, to think about a light switch as an agent. It has beliefs, intentions. You switch it on, it's acting on your behalf, and so on. The idea is that we don't need to think about a light switch as an agent. The light switch is described by a much simpler abstraction, so we don't need to reach for a more complex one. That's the idea.

As our systems get more and more complex, they become less and less describable by simple abstractions. A light switch can be described as a finite state automaton, which is a much simpler abstraction than an agent is. Why reach for agents? Like I said, we're building more and more complex computing systems, so we need to be reaching for more and more complex abstractions. That's the idea. We should reserve the analysis paradigm of an agent to the systems that really do require human orientation and so on. As they become more complex, we need more complex abstractions. The agent, like I said, is such an abstraction. Examples are many. If you think about a personal shopper, if you think about a self-driving car, you really can't describe a self-driving car adequately using a finite state automaton. You could describe it using the agent paradigm. That's the idea is really think about the most complex human oriented systems, and those are which the agent paradigm fits best for.

Now you get to language agents or AI agents, which is what the popular discourse today is. My claim is that we should actually be liberating the agent away from this. You all have seen this picture many times, this is actually from the really popular and great essay from Anthropic, Building Effective Agents, where they say an agent is effectively an LLM in a feedback loop of this environment. It observes something, it gets feedback, and then it maybe carries on another action. This is a perfectly useful abstraction, and it is what an AI agent is. Oftentimes, where do AI agents fit? There's the idea of an agent, which may or may not need AI, which may or may not need LLMs. Then there's an idea of a compound AI system, which is effectively a system of many interacting pieces, an LLM could be one of them. AI agents are at the middle. They are a useful thing to think about. I think we would be much better off thinking about agents as bigger than just AI agents.

The reason why I think this needs to happen is because it shifts our mindset from being implementation oriented to problem driven. If we always think about agents as something that need an LLM to be in a feedback loop with an environment, we don't think about solving problems anymore, we think about specific implementations. Let's liberate ourselves and think about agents as being a little bit broader. Then, I also think it's helpful to be future proof. If you think about agents outside of the paradigm of LLMs being in a feedback loop with the environment today, then it makes you future proof. If a new paradigm comes up, you don't suddenly have to throw out all of your work, you've built on a more general abstraction, and not a more specific one.

## 2. Build Modular and Extensible Agents

That was a theoretical rant. The second is maybe a more practical one, we should be building modular and extensible agents. This is easier said than done. Actually, the trend of innovation in agent architectures stands counter to needing to do this. I've charted the state-of-the-art innovation in agent architectures over the last two years, I've punctuated them with these four punctuation marks. The first one is chain of thought, which was cutting edge as of two-and-a-half years ago, where you asked an LLM, please explain your thinking, or think before answering. That seemed to elicit much more intelligent responses. That was cutting edge. You built agents by just adding that line to the prompt, and it did something much better than it did before that. Then came things like reflection and ReAct where you asked an agent to output something structured, like a tool call, and then you invoked that tool and so on. These are all evolutions in agent architectures.

I couldn't believe that this evolution treats each new one as a bespoke web of context engineering steps, so prompt engineering steps. This isn't modular. We can't evolve an agent from chain of thought to ReAct, or chain of thought to reflection from reflection to ReAct from ReAct to CodeAct in a modular way, like we want our systems to evolve. All of this talk about agent architectures evolving are treating each one as this bespoke thing. What that hints at is that we don't have strong abstractions for agents. Every new innovation in agent architectures defines new concepts, defines new ideas, and builds brand new ideas. What this has meant is that if you want to evolve your agent, you effectively have to scrap the old thing and build from scratch all over again. I've had to do this through this progression now three or four times in my professional career. CodeAct seems to be the new cool kid in town. It seems like if you can get an LLM to write some code, it does really well.

It seems like we've evolved now from ReAct to CodeAct agents, but we haven't built ReAct agents to be modular to make this evolution easy. It's effectively a rewrite. Toss-it-all rewrites are the norm today, if we want to make our agents better. I implore all of us practitioners to not do this. When you build your agents, think about the right modules so that when they evolve, when models get better, when there's a new way to call tools, now there are skills, you don't have to toss everything out and rewrite it. There are reasons for why this is not good: tight coupling, there's no easy way to migrate, and you've got to reinvent the wheel every time.

A really compelling abstraction here to build modular agents is this idea of a cognitive architecture. What you have here is you've got an agent decomposed into a few well-defined subcomponents, memory, and there's different flavors of memory here, procedural memory being the key, and an LLM is at the core of this agent. It just has a tightly scoped role to play. An LLM isn't everything in this agent, it's got a very well-defined place. It's namely the implicit procedural memory of this agent, the procedural details are implicit within the weights of the LLM. This is a fantastic paper, you all should go check out, called CoALA, that modularizes an agent and then casts all of these specific innovations in agent architectures along these abstractions. This is how we've started to build agents at Walmart, and they've become the de facto way of thinking about these things.

There are many reasons for why this is great, it lets us evolve from ReAct to CodeAct in a much easier way, because we've said, the action space is a separate module. The only difference between ReAct and CodeAct is that you no longer have a JSON tool calling action space, you've got a code sandbox as your action space. You just swap that piece out, and now you've evolved from ReAct to CodeAct. In the paper, they analyze the different agent architectures, and they cast it in this CoALA formalism. It's really powerful. You see that these are very different agent architectures, ReAct, Voyager, Generative Agents, Tree of Thoughts, but they're really just mapped to four permutations, like four or five fields being rooted. You've got digital grounding, or you've got agent grounding. You've got reasoning and retrieval, or reasoning, retrieval, and learning, and so on. You can define well-defined APIs for each of these modules. You can also share the workload of building separate modules with different teams. Not every agent has to be this monolithic thing that one team has to build.

## 3. Learn from and Leverage Process Science and Techniques

Now comes the third most important thing I think we all should be leveraging, which is process science. Process science is a very old tradition, been around for at least 40 or so years now. The reason I think it's important, the reason I think it's the missing foundation for agents today in a lot of the discourse is that if we want our agents to generate economic value, if we want them to be useful in the enterprise, if we want them to be useful as personal assistants, they need to enact processes, multi-step, long-running processes that go do some actual work that coordinate a bunch of tasks and so on. Agents have to operate within the complexities of organizational and interpersonal process. Process science is what provides rigorous grounding for processes. Or how could we not lean on the learnings from process science if we want to build actually useful agents. We should be building on the foundations of process science and not trying to reinvent, which is the trend I've seen agents try to do.

I'll give some specific examples in a bit. There are three specific ideas in process science that I think are important for agents. One of them is this procedural memory. What is procedural memory? Procedural memory just says it is the memory of the agent that tells it how to do some task x. Procedural memory might be for a human, how do you brush your teeth? You take your toothbrush, you squeeze out some toothpaste, you put it on the brush and you brush your teeth. That's procedural memory that I've got locked in my mind somewhere. I don't have to reason about that every time I'm brushing my teeth, it's muscle memory. There's a place for that in humans. There's absolutely a place for that in agents. Organizations have a ton of procedural memory locked up in flow diagrams, workflow engines, processes, and so on in people's heads. Procedural memory is a key bit of what an agent needs to be effective. Process science has a lot to say about procedural memory.

There are four representations of processes that are executable, that are understandable and interpretable that we can give to agents for them to go carry out. Claude skills are the most recent example of us discovering the idea of procedural memory in the terms of a Markdown file in Python scripts, but it's effectively procedural memory. You're taking your Python scripts, getting a Markdown instruction, and you're telling, in this case, Claude Code, how to carry out a procedure. How to run data analysis, how to send an email, and so on. Process science has a strong foundation here we can lean on. Then there's workflow systems. [inaudible 00:19:23]. I think we should be leaning on all of the innovation that's happened in this field also for agents, because if [inaudible 00:19:40]. Then, finally, our agents need to learn and understand the process before they go enact it, before they go optimize it. Take an agent, say in Walmart, we've got a [inaudible 00:19:59]. If I just run up Claude Code and said, go issue a purchase [inaudible 00:20:08]. I can write tabula rasa Claude Code skills, but that's not going to get far either. We need to first issue purchase orders, and then we need to enact it. That's where process mining as a subfield of process science comes in.

There's a myth that I'd like to address on, or this myth articulated many times, which is, workflows mean rigid sequence of tasks, and agents are almost antithetical to this, agents are open [inaudible 00:20:47]. Process science has thought about process flexibility. If you look at the top left quadrant, this is usually what the myth manifests itself as, which is, at design time, you specify exactly how flexible the process is going to be, you encode a bunch of if then else statements. That's design time flexibility, which is fairly rigid, because at design time, you have to specify every path. Now we figure out that we want to build a full process and just lean on specifying everything at design time. You need to build much more robust and flexible processes. Here you get different flavors of flexibility. One of them that is especially important to focus on is deviation.

What deviation means is at runtime, can I get some flexibility, where I specify a set of possible tasks, but I don't specify exactly when and how and how many times it should happen, and it just happens. That's the deviation way of specifying processes. Actually, most of our agents today simply lean on the deviation way of getting flexibility in the process. This is an example of a process. This can be modeled in many different workflow languages. The concept that's important is this thing called the ad hoc subprocess. What the ad hoc subprocess basically tells us is that it gives deviation flexibility. Here I said, I've got a bunch of MCP servers in here, I've got a workflow down here, a deterministic one. The ad hoc subprocess basically says, I'm not telling you when to do what, I'm simply telling you, these are the possible things you can do, and you could do them either sequentially or in parallel.

If you think about what an agent is, it's effectively this, you've given them your tools, you've given them some skills, you haven't told them when to do what and what the inputs of each is going to be. You've just specified at design time that these are the possible things you can do. The scientists figured this out for a while, and ad hoc subprocesses have been part of the BPMN specification for the last 15 years. There's a lot that we can learn from [inaudible 00:23:18]. We can get an ad hoc subprocess to be either sequential or parallel. Which means here, I'm executing three tool calls in parallel, I could just as easily have done this sequentially, the toggle of this. Because I've specified this in a formal workflow language that's executed on a workflow engine, what this means is you get durability. If the second tool call fails, I don't lose the data from the first tool call.

These are all the things that agent builders are having to reimplement and rediscover, but exist from the process science formalism. Many AI agents today do exactly this. You simply at design time say which tools to go play with, and the LLM decides what to do. In this process map, effectively that's what I have, just note there is a call to an LLM, it sees all of these MCP servers. I'm not telling it when to invoke which tool, I'm simply saying these are the things you can possibly do, do them however many times you want, do them in the order you want. That's what this process diagram is declaratively specifying. Then you've got this idea of worklets, which is effectively agent skills that has existed in the process science world for quite a while now. The idea is basically, worklets are interesting, because like Claude Code skills, the LLM, the agent doesn't even see the skill until it needs access to it.

You're not even specifying it at design time, you're just saying, when you need it, if you need it, it's here. That's what happens in the worklet world.

I just wanted to give you a flavor of this. You can think of agents as just flexible processes. You can specify them as flexible processes, they can enact business processes. Then you might ask, why don't I just write my own workflow DSL? This is what your popular agent frameworks end up doing today. You look at LangGraph, it's a custom DSL for building workflows. If you look at what Google ADK has done, it's a custom DSL for building workflows. Any of the agent frameworks are converging to this idea that like workflows matter. If you specify a workflow, there's a runtime that knows how to execute workflows and give you all the functional benefits of a runtime. They're all starting the same old song and dance where they're coming up with their own workflow DSL, and making the same mistakes of the previous generation. Then you might say, ok, fine, I'll just use an existing workflow DSL. Why don't I just write my own engine?

Having written many workflow engines in my life, there are a ton of got you's to this that you probably don't want to do. It's domain independent, you probably just want to lean on an existing well tested solution for this. Why do they matter? For various reasons that aren't specific to agents. They provide durability, you can get scalability. A workflow is effectively configuration for this engine, and it'll execute it at scale durably cross-region, you get all that stuff. You don't have to worry about it as the implementer of the agent. You just specify the agent process, and the engine takes over the rest. You get all the control plane niceties. The async is another key one. You pick a robust enough workflow language, asynchrony, message handling, all of that stuff is just provided out of the box, you don't have to reimplement all of that.

The last thing about process science that's key is process mining. If you look at what this is, it's effectively 1 and 2, your systems, your humans, your customers all emitting transactional events. Someone walks into the store, we count it as an event. Someone checks out and adds items to their cart, that's captured as an event in some transactional system somewhere. That's all event data. Process mining is a field of process science that says, give me all [inaudible 00:27:36] as is. This is the old process. This is process as is. This is key, because if you look at what enterprises are struggling to do today with agents is, how do I get an agent to do the process of a bunch of people, that a bunch of people are doing today or a bunch of systems are doing? The claim is that there's no [inaudible 00:27:56] first know what that process is. Most of our enterprises don't have a full understanding of process.

What ends up happening is there's some automation, and then for all of the tail cases [inaudible 00:28:10]. Now you can't just throw an agent at it because the agent doesn't know the process either. This is where process mining comes in. We've seen some success from trying to do this where you discover a process and then you codify the process in a system, and then you ask an agent do it. You can automate process discovery and process management this way. The big picture is that agents should be able to enact business processes otherwise they won't be very valuable to businesses. Two, workflow engines offer mature infrastructure for executing processes, and agents can leverage this if they can be modeled in a process language that offers flexibility. You don't want a process language that doesn't give you the flexibility of building an agent, so make sure you pick a good one. There's three examples there. The last thing is agents and agent builders need to leverage process mining to discover and understand existing processes.

Together, I think this is what will help us build agents that withstand the hype cycle. Now you've got agents that can do any business process. You've got an agent that executes durably, that's effectively configuration for an engine, that executes at scale, and every agent builder doesn't have to worry about it. Then, lastly, now you've got agents that understand actual process as is. If you've got these three, then you've got agents that can actually do meaningful work.

## 4. Terraform the Environment for Agents

Last but not least, another topic I don't think gets enough airtime, which is Terraforming the environment for agents, sort of a provocative title. The claim is that the most capable agent is only as effective as the environment that it operates in. If you don't set up your agent for success by programming its environment, then it's doomed to fail. The bold claim here is we shouldn't just be focusing as enterprise like software engineers, not just on building agents, but we should be thinking about reshaping the world that they inhabit. The claim stated differently is that we obsess over agent architectures, but we neglect everything around it. We've got legacy systems, poorly documented APIs, very little defensibility in our systems and services, and then we expect our agents to succeed in this environment. It doesn't work that way. The answer I've seen to this is usually MCP. MCP is not an answer to any of these problems, it's simply an interface.

It provides your agent a natural language interface to a system so that an LLM can decide when and when not to use something. It could be the interface to your actual solution of reshaping the environment, but it's not the underlying implementation. There's an idea from Agent Oriented Programming, this idea anyway is about 10, 15 years old, which is effectively environment programming. The idea is that you want to reshape your environment to make it navigable, observable, and actionable for agents. The analogy that I find really powerful here is we as humans have Terraformed the environment around us to make it safe for us to exist in it. Our kids exist in it. The elderly exist in it just fine. We exist in it just fine. We've built, we've Terraformed the environment enough so that we can exist in it safely. My claim is that we need to do the same for agents. Our digital environment today is not very ergonomic for agents. There are many things that you get from Terraforming.

One is that you can implement guardrails, not just at the agent, but at the environment. You can get legibility and feedback loops and so on. Why is this important? Why do I think our environment is not ergonomic for agents today? One is, at best, our systems today are built for multi-tenancy. The agent world is, I claim, hyper-tenancy. Our systems are built for multi-tenancy, barely built for multi-tenancy. Agents represent all of these three things which our systems are built for, which is hyper-tenancy [inaudible 00:32:41], interacting with that. That's what I mean by hyper-tenancy. Agents are also unpredictable. They can be unpredictable. Our services are not very unpredictable. Agents are cross-functional. Services are usually tightly scoped. Agents, they're these human-oriented systems, they're cross-functional. They might [inaudible 00:33:10] as they're executing a pull. You might tell an agent to go check something out at an e-commerce store for you. It needs to search. It needs to touch product details. It needs to touch account. It needs to touch checkout. Usually, you don't have [inaudible 00:33:30]. That's the thing that makes an agent different from services.

What does this mean? The environment is hostile for agents today. We typically have them interacting with our environments at the lowest level. We expose MCP, which is fine. MCP doesn't provide any opinions about higher-level abstraction. It provides no opinions about governance and so on. All of those things need to be implemented underneath MCP. The environments today, like I said, are built for few trusted integrations. They assume a very tight scoping of access. They typically don't have very many guardrails. If your systems are anything like the systems I interact with on a daily basis, we don't build them very defensively. We expect the client [inaudible 00:34:22], and we expect the server side to handle other validations and so on. They don't typically have many guardrails. Policy is usually pretty distributed, like policy enforcement. That's also not great for agents. The other thing about agents, because they're hyper-tenant, is you could have systems where one agent undoes or overrides the work of some other agent, very naturally.

Now you've got hundreds of thousands of these things that could happen. You need arbitration, governance, auditability, and so on, which our environments today don't provide. Here there's this idea called an artifact, which you all should go look up. The idea is that an artifact is a specific interface environment. It encapsulates some functionality. It provides higher-level abstraction. It provides operating instructions for agents. It is the ergonomics of the environment for the agent. There's an artifact at multiple levels you can build for agents. The lowest level, you've got your services. At the second level, you might have ontology-based APIs. Here, issuing a purchase order isn't hitting five APIs. It's one event-driven command to an artifact. It's a higher-level abstraction. Then, finally, you can provide an artifact that actually [inaudible 00:35:51]. This is one of our environments for the agents that are enacting at Walmart.

There are three flavors of artifacts. I'll focus on the first one. The point of a boundary artifact is that it is interface to a bunch of capabilities. It provides security and organizational control, and it provides a high-level abstraction for an agent. Here's a difficult-to-read example. The idea is you've got domain services way at the bottom, and you've got agents way at the top. Now you're bracing for a world where you're going to have a proliferation of agents, hundreds if not more agents up top. The integration with each of the domain services underneath just simply isn't feasible. They're not defensible, like I said. You can't audit who does what very easily, and so on. What we're doing is we're building this middleware stack of these boundary artifacts. The boundary artifacts expose a very simple API to the agents, and they provide auditability, governance, and so on. In the middle, these boundary artifacts are implemented as event-sourced entities.

What that means is if an agent wants to issue a purchase order, for example, it simply calls the purchase order artifact and says, create purchase order. Now what happens is that boundary artifact tracks that request in an immutable event stream, so you get auditability. I know my Aditya's agent has requested that a purchase order be created, and that's an immutable fact that will exist in part of that artifact forever and ever. Now tomorrow, if the purchase order team comes in and says, cut access to all purchase order requests from Aditya's agent, you've got one place where that can be enforced, because all agents are talking to these services through that artifact. Then we expose an MCP layer on top of the artifacts, which is fine. The key is that the artifact serves a specific purpose. It makes the environment much more [inaudible 00:38:02]. In this case, the boundary artifact is providing abstraction. It's also providing auditability, governance, and arbitration.

For example, the purchase order team might say, four agents from department X only allow department X to issue a million dollars' worth of purchase orders in a 48-hour period. That's a reasonable rule for them to think up, and that can be enforced at the artifact. Every request from every agent is tracked. Hyper-tenancy is allowed, because the artifact absorbs all of that responsibility. The services can evolve like they've evolved. You don't touch the services. The services can continue to function how they functioned. You're just providing this middleware for agents to interact with the environment.

The key thing is artifacts are not equals. Tools are functions agents can call. Artifacts are first-class environment entities. They've got different responsibilities. First of all, they don't need to be called by an agent. They don't even need to be exposed as a tool to an LLM. An agent might decide to deterministically call an artifact. That's totally fine. Artifacts can be wrapped and exposed as tools to agents. They provide a different capability. Tools today, mostly what you see is they take stateless REST API actions. They've got flat namespaces. They don't provide auditability out of the box and so on. Artifacts can provide all of this, because boundary artifacts anyway, are implemented as these event source entities. You get stateful, observable entities. You can take long-running asynchronous actions and so on.

The key thing here is as we think about engineering our environments for agents, we should make sure to think about the abstractions they're providing, the governance and operability that they're providing, and that they're ready for this hyper-tenancy that the agent world is poised to enact. One of the seminal works on the artifacts is the environment isn't just a passive container for agents. This is how we've been talking about services and MCP tools. They're just these things that exist that get wrapped as an API. They don't do anything active. They just wait there for something to happen. Actually, the artifact abstraction gives you a way of molding, to think about the environment as something that's moldable for an agent. The idea here is MCP gives you syntax of interacting with the environment. Artifacts give you the semantics of interacting with your environment. MCP just says, here's a tool, here's a description, here's a schema. That's just the syntax. In this case, the artifact gives you governance, abstraction, and so on, which is the semantics of the environment.

## Summary

I'll wrap up by just summarizing the four big ideas that I think deserve more attention. One is, let's embrace a much stronger notion of agents so that we're not caught two steps behind, and that we can actually solve meaningful problems. We're problem oriented and not solution implementation driven. The second is, let's really think about extensibility for these agents, especially because the models are only getting better faster. More and more agents are going to have to get rewritten. Let's just be careful that we don't scrap everything and rewrite every time that happens. Let's really think about the right abstractions for our agents so that they can be evolved in a modular way. They can be optimized in a modular way and so on. CoALA is one good example here. The programming model of DSPy is also another interesting one. You think about an agent as a compound AI system with separate modules. DSPy gives you other guarantees. At its core, it's a modular way of thinking about an agent, just like CoALA is.

I would say lean into those approaches a little bit more. We're experimenting with CoALA and DSPy, but I'm sure there are other ways of thinking about agents in a modular way. The core idea here is an agent should have the right subcomponents that withstand how quickly agent architectures and models are changing. Because at their core, agents are memory modules, they're action modules, and it's a combination of all of that and an LLM at the center, AI agents anyway. If we think about it that way and expose the right APIs around those modules, we can evolve agents no matter what happens to the underlying architecture without having to rewrite it. Third, and perhaps most importantly, let's lean on process science and not reinvent all of that stuff with agents. Finally, let's really lean on changing the environment so that they're well suited for agents. Let's not just slop MCP layers on top of existing REST APIs. That's the hot take for the last one.

Really think about making your environment ergonomic for the hyper-tenant, unpredictable, cross-functional characteristics that agents bring that services did not bring. Those three are key differentiators of the agent world from the services world. I often hear this claim, agents are just services with LLMs in the middle. You can think of them that way, but really, you're violating one. Embrace a stronger principle of an agent because they will let you think about systems in a different way than the services way of understanding systems did. Taken with these four in mind, I think this gives you the right historically grounded formal ways of approaching building agent systems that have existed for a while and will continue to exist. Like I said, these are old ideas, but I bet you, at least for some of you, these seemed like they were from the future. We've been thinking about it this way. I think these four ideas have withstood the test of time and they'll continue to do so, so that when we build our agents on these foundations, they too will do the same, regardless of what happens with LLM and AI agents.

## Questions and Answers

**Participant 1:** Talk about the difference between an artifact and a tool. Would you say that tools should always be fast [inaudible 00:44:51], MCP tools?

**Aditya Kumarakrishnan:** MCP is a perfectly reasonable interface, I think. It's actually a fantastic interface if you want an LLM to be the driver of calling your tools, because you've got semantic, natural language descriptions of your tools, and LLMs are great at that stuff. MCP is fantastic. I'm not poopooing on MCP at all. What I'm instead saying is, what is the underlying implementation of your MCP tool? Is it just a wrapper over a REST API? I hope not. Really, let's think about, these are the three canonical examples of artifacts. Boundary artifacts give you governance and security and auditability. That's an abstraction. Resource artifacts mediate access to things like databases and so on. Then coordination artifacts are an interesting one. Coordination artifacts are, if you want two agents to coordinate, they might need some shared state. That can be exposed as a blackboard artifact, where two agents share a little thing that's namespaced for both of them, and it's temporary and so on. That can be exposed as an MCP tool, but it implements a specific capability. All of these can have MCP wrappers, but it's really about the underlying semantics that they're providing, not just the syntax.

**Participant 2:** Can you give an example of those artifacts [inaudible 00:46:22]?

**Aditya Kumarakrishnan:** I don't think there's a one-size-fits-all answer to that, because it really does depend on the resource. If you look at the artifact literature, there's an interesting notion of workspaces, what this does is, a workspace is a collection of artifacts. It's an abstraction that lets you say, I belong to this workspace, and this workspace has a bunch of artifacts in it already, so I have access to that artifact. You can group a bunch of resource artifacts in workspaces. You can have fine-grained access control at the workspace level, which is a collection of artifacts. That's a matter of how Wild West do you want to be with what your agent can do on top of that resource. The abstraction is sound. The semantics are there. If you start thinking about resource artifacts, you immediately start thinking about, can I hide implementation details? What is the observable state, and so on? Then you group them as workspaces.

**Participant 3:** This is amazing, the connection to process science and the wisdom from the previous decades on these topics. Can you share, if there are any, what's the best example of frameworks or abstractions that currently exist that people could go check out and use that are on GitHub or something that would help them to formulate how they design their agents in this way, using the principles that you're talking about? I'm just thinking about things like REST or other common things that we use, even things like JSON. What are the emerging things that are going to help people follow these types of patterns?

**Aditya Kumarakrishnan:** I would lean more on understanding the problems that your agents are solving and look for the ways those problems have been solved well in the past. If you want your agents to go enact a process in a scalable way, reach for everything from the process science world that's done that already and see how that can be extended for agents. If I want to just specify my agent as a flexible process and I want to execute it in a scalable way and I want it to be observable, if those are your three requirements, then I would say look at a mature workflow engine that has a flexible workflow specification language. One example might be Camunda and BPMN. BPMN has long been a standard for workflow execution engines. It's got some of the flexible primitives like the ad hoc subprocess I showed you. The ad hoc subprocess basically captures the flexibility that our AI agents do today as an available primitive. That's a powerful primitive to reach for.

Another one is if you've got a workflow engine that can understand a Turing complete language. Take something like Temporal where you can just write Java code or Python code and then it'll just persist all of that for you, give you the non-functional benefits. That's also a great option if you want to keep writing Python code and have it run durably. Then what you have to do is you have to build these flexible primitives. You have to build something like the ad hoc subprocess, which specifies something at runtime that you didn't at design time, declaratively. That would be my suggestion for running agents at scale.

**See more presentations with transcripts**