Introducing eve

Vercel has launched eve, an open-source agent framework for building, running, and scaling AI agents with built-in production features like durable execution, sandboxed compute, and human-in-the-loop approvals. The framework aims to standardize agent development, similar to how Next.js standardized web development, by providing a structured approach that eliminates the need for teams to build common infrastructure from scratch.

Today, we are proud to introduce eve https://vercel.com/eve , an open-source agent framework for building, running, and scaling agents. eve is designed around the idea that building an agent should mean defining what it does without assembling all of the pieces that it needs to run in production. Instead, eve comes with production already built in: Durable execution Sandboxed compute Human-in-the-loop approvals Subagents Evals And more eve is the framework that we build and run our own agents on. Agents today are where the web was before frameworks, with everyone hand-rolling the same plumbing and nothing carrying over to the next one. Next.js https://nextjs.org/ ended this for the web, and eve is doing the same for agents. This is an eve agent. Each file describes one component of the agent, so at a glance, the tree tells you what an agent is, what it does, where it lives, and when it acts on its own. Every agent starts with its definition. The agent.ts file is where you configure the agent itself. You can define the model with one line, with provider fallbacks supported through AI Gateway https://vercel.com/docs/ai-gateway , and compaction, model options, and other optional fields https://beta.eve.dev/docs/agent-config other-defineagent-fields are there when you need them. Giving your agent a job and personality is as simple as creating an instructions.md file, which serves as the system prompt that eve puts in front of every model call. You create files for what your agent does, like post chart.ts and revenue-definitions.md for tools and skills, and eve wires them into a working agent without any boilerplate or plumbing to manage. You can just focus on what your agent does instead of how it does it. We had built agents for years at Vercel, v0 https://v0.app/ among them. But once coding agents made building one something anyone could do, everyone did. We shipped hundreds of agents and internal apps, and it looked like a productivity revolution. But underneath it, every team was building and rebuilding the same plumbing before their agent could do anything, and none of it carried over from one use case to the next. Each agent was designed for a different task, but they all had the same needs, and the same structure kept emerging to meet them. Agents have a shape. eve is that shape made into a framework. Every generation of software earns its abstractions once enough people have built the same thing the hard way, and agents are there now. Everything an agent needs in production ships with the framework. Agents wait on people, call slow systems, and run for hours, days, or weeks. In eve, every conversation is a durable workflow with each step checkpointed, so a session can pause, survive a crash or a deploy, and resume exactly where it stopped. This durability is built on the open-source Workflow SDK https://workflow-sdk.dev/ . The code your agents write should be treated as untrusted, so eve keeps agent-generated code out of your application runtime entirely. Every agent gets its own sandbox, an isolated environment for shell commands, scripts, and file reads and writes, running in a separate security context from the harness that controls the agent. The backend behind this sandbox is an adapter. When deployed, it runs on Vercel Sandbox https://vercel.com/docs/sandbox . Locally, it runs on Docker, microsandbox, or just-bash https://justbash.dev/ , and you can write an adapter for any other provider. Agents act on real systems, and some of those actions should require a person to approve them. Any action in eve can be configured to require approval, and the agent will pause there and wait, indefinitely if it has to, without consuming any compute. Once approved, eve continues the task right from where it left off. Agents need to connect to your backends, data, and other third-party services. In eve, a connection is a file that points at an MCP server or any API with a compatible OpenAPI document. eve discovers the remote tools, hands them to the model, and brokers the auth, and the model never sees the connection's URL or credentials. Vercel Connect https://vercel.com/connect handles interactive OAuth with consent and token refresh built in. At launch, eve agents can connect to Slack, GitHub, Snowflake, Salesforce, Notion, and Linear, plus anything else you can reach over OAuth, an API key, or an MCP server. Most agents live in exactly one place because every new surface is its own integration to build. In eve, the same agent serves every surface, and each channel is just a small adapter file. The HTTP API is on by default, with Slack, Discord, Teams, Telegram, Twilio, GitHub, and Linear included, and defineChannel covers custom channels. One channel can also hand off to another, so an incident webhook can open an investigation thread in Slack. When an agent gets something wrong, the first question is what the agent actually did. In eve, every run produces a trace. Each model call and tool call appears in order with its inputs and outputs, down to the commands the agent ran in its sandbox, so you can replay the run instead of piecing it together from logs. The spans are standard OpenTelemetry and export to any tracing service you already run, whether that is Braintrust, Honeycomb, Datadog, or Jaeger. On Vercel, they surface in an Agent Runs tab under Observability, giving you one place to watch every session and drill into any run. Evals let you go further, with scored test suites you can run locally or wire into CI. That leaves the part no framework can write for you: what your agent actually does. The most common way to give an agent capabilities is to give it tools, and to teach it how to do things with skills. Today that means building the tool, writing the skill, and then wiring both into whatever runs your agent loop. With eve, a tool is one TypeScript file and a skill is one markdown file. Notice what is missing. Instead of writing all of the boilerplate to wire these up and register them with your agent, eve handles it for you. A file's name and place in the tree are its definition. eve picks up the tool and skill at build time, hands the model their descriptions, and the model takes it from there. Just as Next.js https://nextjs.org/ turns a folder into a route by owning the routing, eve turns a file into an ability by owning the agent loop. Requiring approval for an action is one field on the tool. Now you can guard the expensive query, the destructive write, or anything else you would not want running unsupervised. The tools you define aren't the ceiling. eve gives your agent a real computer with a shell, so it can run bash, grep, and anything else you'd run in a terminal. When a job calls for code that doesn't exist yet, the agent writes and runs it. Your agent can solve problems on its own in a secure sandbox, reshaping a dataset, running a one-off analysis, or writing whatever code a job needs that no tool covers. An eve agent can also delegate. A subagent is the same shape one level down, a directory inside subagents/ with its own instructions, tools, and sandbox. The parent calls it just like it calls a tool. The child starts with a clean context window and only the tools you gave it, does the work, and hands the result back to the parent. Now comes the part every developer looks forward to, testing their agent. That used to mean starting the process, asking a question, and reading logs, with no simple view of which tools were used, what the model loaded, or why it answered the way it did. You wanted to talk to your agent and watch it work, and what you got was stdout . With eve, the dev loop is one command. To start an eve agent, you run its dev server. Everything the agent did is visible in the TUI. The agent loaded the skill, ran the query, answered by the team's rules, and each of those lines is a checkpointed step in the durable session. The terminal UI is just a client, and the agent serves the same structured events over HTTP, so curl , a test script, or CI can drive it and check exactly what it did. Talking to the agent proves one run at a time. Evals test your agent the way you test the rest of your software, with scored checks written in files like everything else in the project. You can run eve eval locally or point it at a deployed app, so a prompt change or a model swap shows you what it broke before your users do. The agent has lived on your laptop long enough. Shipping it is normally the step where the agent work stops and the infrastructure work begins. With eve there is nothing to provision, because the agent is an ordinary Vercel project, and it deploys the way any other frontend or backend does. Nothing about your agent changes when you deploy, because eve was designed from the ground up with adapters in mind. At launch eve deploys to Vercel, with support for other platforms on the way. The same directory runs in production exactly as it ran on your laptop. The sandbox swaps to Vercel Sandbox without a code change, and the agent you were talking to in dev is now reachable at a public URL. Deploying does not even interrupt the agent; a session that is mid-task when you push finishes on the version it started on. There is no dashboard step required in any of this. The same coding agent that built your agent can ship it and verify its work. But deployed is not the same as done. In production, an agent has users to meet and work to do on its own schedule. Getting an agent into Slack used to mean building a Slack app first, including the app config, bot token, event subscriptions, webhook endpoint, and signing secret, all before the agent said a word. With eve, a channel is one command. The command writes channels/slack.ts , a single file that ships like any other code change, and the agent you just deployed now answers in Slack. The platform affordances come with the channel, so approvals render as Slack buttons, questions as select menus, and the agent posts typing indicators while it works. Route the credentials through Vercel Connect https://vercel.com/connect and there is no bot token to copy into a .env file. Run the command again with discord or teams , and the same agent is there too, one file per channel. Channels are the user interface of your agents, and sessions move between them. A question asked in Slack can continue on the web, and an incident webhook arriving over HTTP can open an investigation thread in Slack and finish the work where the team already is. The Monday revenue report should not wait for someone to ask. A schedule is one more file, a cron expression and a handler that starts the agent on its own clock. On Vercel, each schedule deploys as a Vercel Cron Job https://vercel.com/docs/cron-jobs , so the report posts every Monday with nobody on the hook to remember it. An agent your team depends on is production software, and a change to its instructions can break it as surely as a change to its code. Because an eve agent is files in a directory, it lives in Git like the rest of your code, and a new prompt, tool, or skill is a commit with a diff, a review, and a history. Wire eve eval into CI and the suites you wrote become the deploy gate, scoring every commit so a regression stops in CI rather than in production. Every commit also gets its own preview deployment, and it carries the agent's channels with it. The team can talk to the next version of your Slack bot before it replaces the one they use every day. And when a change goes bad in a way no eval caught, you can roll production back https://vercel.com/docs/instant-rollback to the previous version instantly. We run more than a hundred agents in production at Vercel, and they are part of how the company operates every day, each one taking on a role in the business. Here are a few of them. The most-used internal tool at Vercel is an agent, handling more than 30,000 questions a month. Anyone can ask d0 anything in Slack and get an answer from the warehouse. Every query is scoped to the asker's own permissions, so d0 can never show you a table you could not already see. Lead Agent runs the playbook of our best rep around the clock. It works every new lead the moment it comes in and follows up on its own, so none go cold overnight. It costs about $5,000 a year to run, returns 32 times that, and one engineer maintains it part-time. RevOps built Athena in six weeks without engineers. It answers pipeline and forecast questions from Snowflake and Salesforce in plain language, and pipeline coverage nearly doubled after it went live. Vertex is our support agent that handles tickets across the help center, docs, and Slack around the clock, ensuring people get a fast response no matter when they ask. It reads the ticket, finds the right answer, and responds, solving 92% of tickets on its own and escalating the rest to the support team so they can focus on the problems that most need their attention. Anyone at Vercel can write, not just the content team. draft0 runs a full review pipeline, catching the most glaring issues and building up an analysis of what the piece is actually about before it ever reaches us. By the time it does, the obvious work is done and we have a much clearer picture of what it needs. That means smaller pieces move fast, and we can give our full attention to the ones that demand it, like this one. We rely on hundreds of agents every day, but keeping track of which one handles what workloads is not efficient. So instead of routing tasks ourselves, everything goes to V in Slack first. V figures out which agent can actually answer the task and routes it there, which means the whole fleet works like one agent instead of a hundred different options. These agents all began as separate projects on separate stacks, each with its own way of holding state, brokering credentials, and emitting logs, which is where most teams find themselves after their second or third agent. Today they live in one monorepo, and are built, observed, and upgraded the same way, no matter which team owns them. Because they all share the same shape, a hundred agents run with the same tools and the same conventions as one. A year ago, agents triggered less than 3% of the deployments on Vercel. Now, they trigger around 29%, and we expect half of all deployments to come from agents soon. You have probably built an agent already, and the next one does not have to start from scratch. The public preview is open today, and the CLI wizard walks you through your first agent, from picking a model to a running dev server, in under a minute. Coding agents just need a prompt: Everything eve can do is at docs.eve.dev https://docs.eve.dev/ , and development happens in the open at github.com/vercel/eve https://github.com/vercel/eve , where issues, discussions, and contributions are welcome. Hundreds of agents already run on eve at Vercel. What will you build?