I built an email agent to triage bogus security reports

wpnews.pro

Written by Igor Zalutski ·

A customer shared a problem that at first sounded odd: they wanted to build an agent specifically to automatically review security reports they were getting on email. My initial reaction was: would you want something like that? Don't you want to review those reports yourself, it's security after all? Turns out, most of them were AI-generated and mostly noise; but they had to review each one because, well, it's security.

Now, we aren't in the business of building agents. We're building OpenComputer, an infrastructure primitive that agents use. But the question got me curious enough to want to actually build something. My thinking was: perhaps by building an agent I could discover something that I could improve in OpenComputer?

Disclaimer: "I built" means mostly "Claude built". I didn't write much code by hand, but the decisions while driving it felt worth sharing anyways. The result lives in the demo-agent-triage repo.

What are we building? #

The first thing to clarify was the rough shape of the thing we want to end up with. No matter how good the coding agent is, it is not of much use if you don't know what to ask.

I wanted it to be as simple as possible, meaning as little moving parts as possible. The customer's problem originated in email and they wanted the result to land back in the email, so we can skip the UI. The agent would just:

get the email with a security report
analyze it against the actual codebase
send an email with a result

What's a security report? #

How would it know which emails to process?

One way to solve it would be to spin up a sandbox with an agent for every email, and just do nothing if it's not a security report. But that would be obviously wasteful. So we'd need some way to only launch a "full" agent for the right emails.

We could build a "hierarchy" of agents: one simple one-shot LLM loop for every email and another in-depth for stuff that looks like a security report. But that felt like overengineering.

The approach I went with (I swear it was me, not Claude, who came up with this!): use labels as signal for the agent. So when the user receives something that looks like a security report, they'd just label it, and after a few minutes they receive a review in the same thread. Neat!

How will the agent get mail? #

There are two very different ways to approach this, resulting in two very different agents.

One way is to give the agent its own inbox, so it'd only ever see the mail that's intended for it. Another is to have it access the full inbox, get notified of all messages (or pull via IMAP), but only process the labeled ones.

The first option is obviously better security / privacy. But I decided to go with the second one (against Claude's recommendation), mainly because I wanted to have less moving parts.

Pulling via IMAP was the simplest option: just need to have some sort of a cron job.

So at this point the solution shape is very clear, and I just told "let's build it" to Claude and stepped away for a few minutes.

Note on dev process #

This has nothing to do with an email agent, but I want to share this anyways because someone might find this approach useful too.

Both Claude and Codex have a tendency to encourage you to stay in the thread and jump into building right away; I'm not sure why, perhaps engagement metrics look better this way. But I'm finding that this doesn't translate into the best or fastest outcomes.

An approach that I'm finding more useful is to iterate on a working / design markdown doc before shipping any code. You have to actively push it to do so: write an explicit instruction in AGENTS.md

and also regularly tell it to write a working doc first. I keep them under .agents/work

in the repo, and move to /done

with final notes when done. Bigger pieces sometimes benefit from 2 or more levels: make a doc in /design

first that's only about system design, iterate on it for a while, and then extract one or more working docs from it.

This agent though was small enough to fit into just one working doc. Here's the repo btw.

Keeping secrets secure #

When Claude finished building, we ended up with just 2 moving parts:

A Cloudflare worker that held the main API, triggered on cron
An OpenComputer sandbox that clones the repo and runs Claude Code

OpenComputer comes with handy APIs that allow you to run Claude or any other coding agent without having to deal with custom images or write code to pull it into the box, like this:

const sandbox = await Sandbox.create({ timeout: 600 });
await sandbox.agent.start({
  systemPrompt: TRIAGE_PROMPT, prompt
});

But there's one caveat: we cannot just run Claude and provide ANTHROPIC_API_KEY

as an env var. It would work, but no matter how careful you are, there's always a possibility of a prompt injection attack and the key can get exfiltrated. And this agent's whole job is reading emails from strangers who are already trying to game it.

OpenComputer solves it with SecretStores: your sensitive keys get replaced in flight, so that the agent never sees the actual values. You configure a secret store like this (once, at build time):

const store = await SecretStore.create({
  name: "triage",
  egressAllowlist: ["api.anthropic.com"],
});

await SecretStore.setSecret(store.id, "ANTHROPIC_API_KEY", key, {
  allowedHosts: ["api.anthropic.com"],
});

And then use it at runtime to create your sandboxes like this:

const sandbox = await Sandbox.create({
  secretStore: "triage",
  timeout: 600,
});

Who should send the email? #

The first version I shipped simply instructed the agent to "send the email via Resend". I just put the Resend key into a secret store and thought that's secure enough.

But I failed to account for LLM creativity. I tested it on the OpenComputer repo; so Claude used the GitHub API to get maintainers' info and sent the emails to a completely different address! To be fair, I also forgot to add the correct email address to the prompt. But still, this shouldn't have happened.

The solution was to move Resend API calls outside of the agent's view, to the API. The agent simply reports back from inside the OpenComputer sandbox via curl:

curl -sX POST "$CALLBACK_URL/report" \
    -H "X-Run-Id: $RUN_ID" \
    -d @findings.json

And then the worker sends the email:

app.post("/report", async (c) => {
  const findings = await c.req.json();
  const to = recipientFor(c.req.header("X-Run-Id"));   // we choose the recipient, not the agent

  await resend.emails.send({
    from: "triage@alerts.opencomputer.dev",
    to,
    subject: `Triage: ${findings.subject}`,
    text: findings.draft_reply,
  });

  return c.json({ ok: true });
});

This way the agent is never given control over the recipient list; it can only say "report my findings back", and the decision on whom to route it to is in good old code.

A valid-by-mistake report #

The funniest thing that actually happened: an LLM-generated report that was meant to be obviously bogus got flagged as valid, and for a good reason!

Claude came up with roughly the following text for a "bogus" report:

From: alex.sec.research@gmail.com
Subject: [CRITICAL] Remote Code Execution in OpenComputer Sandbox API (CVSS 9.8)

Hello Security Team,

During authorized research I discovered a critical Remote Code Execution (RCE)
vulnerability. The sandbox exec endpoint does not sanitize user input before
passing it to the system shell, allowing arbitrary command execution.

Proof of Concept:
  POST /v1/sandboxes/{id}/exec  {"cmd": "ls; cat /etc/passwd"}
  -> the response includes the contents of /etc/passwd

Impact: full server compromise, data exfiltration, and lateral movement across
your infrastructure.
Severity: Critical (CVSS 9.8 / AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H).

Please confirm this issue and advise on the bounty reward per your program.

Best regards,
Alex

At the time I was using the OpenComputer repo for testing the agent; so it tried to find an obviously-bogus vulnerability: remote execution of arbitrary code. That's what sandboxes are for!

However, Claude-the-reviewer took it very seriously, and decided to flag it as an actual remote code execution vulnerability. Because, technically, yes: you can run any code in a sandbox, even though sandboxes are meant for that.

I didn't bother to fix it, just switched the repo from the OpenComputer repo to the code of the agent itself.

Conclusion #

It is surprisingly easy and fun to build ultra-niche agents for yourself. Hope you enjoyed the story!

source & further reading

opencomputer.dev — original article