# Auditing What Your Email Agent Actually Did

> Source: <https://dev.to/qasim157/auditing-what-your-email-agent-actually-did-ejc>
> Published: 2026-06-15 01:20:40+00:00

Debugging a misbehaving email agent at 2am is a special kind of miserable. Your application logs say the LLM "decided to follow up." Cool — with whom? Saying what? Did the message actually go out, or did it bounce? Agent frameworks log *intentions*; what you need during an incident is a record of *actions*. For email agents there's a piece of good news hiding in plain sight: the mailbox itself is that record.

An Agent Account (currently in beta) is a real hosted mailbox with six system folders — `inbox`

, `sent`

, `drafts`

, `trash`

, `junk`

, and `archive`

. The `sent`

folder is the part security reviewers should care about: every outbound message the agent produces is stored there as a real message object, timestamped, addressed, and fetchable through the same Messages API you already use.

This holds across every path into the mailbox. As the [mailboxes guide](https://developer.nylas.com/docs/v3/agent-accounts/mailboxes/) notes, anything sent over IMAP/SMTP appears in the API, and anything sent via the API appears in the Sent folder in a mail client. There's no separation between protocol traffic and API traffic — so there's no side door an agent (or an attacker holding its credentials) can send through without leaving a copy.

One more property matters for audit integrity: sends are stamped with the grant's own address. An Agent Account can't spoof other identities, so a message in `sales-agent@`

's sent folder was sent *as* that agent, full stop.

Reviewing what an agent did over the last day is a single API call against its grant:

```
curl --request GET \
  --url "https://api.us.nylas.com/v3/grants/$GRANT_ID/messages?limit=50&in=sent" \
  --header "Authorization: Bearer $NYLAS_API_KEY"
```

And because replies group into conversations using standard `Message-ID`

, `In-Reply-To`

, and `References`

headers, you can reconstruct the full back-and-forth around any send — what the agent received, what it said, what came back:

```
curl --request GET \
  --url "https://api.us.nylas.com/v3/grants/$GRANT_ID/threads/$THREAD_ID" \
  --header "Authorization: Bearer $NYLAS_API_KEY"
```

That thread view is the difference between "the agent sent 14 messages" and "the agent sent 14 messages because someone kept replying with a question it couldn't answer." Context is what turns a log into an explanation.

Knowing the agent *attempted* a send isn't the same as knowing it landed. Because the SMTP path for Agent Accounts is owned end to end, deliverability comes back as webhooks on every outbound message:

| Trigger | What it tells you |
|---|---|
`message.send_success` |
The recipient server accepted the message |
`message.send_failed` |
The send died first — an outbound rule block, policy limit, or deliverability gate |
`message.bounce_detected` |
Hard or soft bounce from the remote server |

Pipe these into the same place as your application logs and you get a server-side stream of agent outcomes that doesn't depend on the agent's own logging being honest or complete. Inbound mail produces `message.created`

events the same way — note that bodies over roughly 1 MB arrive as `message.created.truncated`

with the body omitted, so fetch the full message by ID in that case.

Rule evaluations are logged for audit as well: when a policy rule blocks, routes, or flags a message, there's a record of which rule fired and why, which answers the other 2am question — "why did this message never reach the inbox?" You can pull those records per grant:

```
curl --request GET \
  --url "https://api.us.nylas.com/v3/grants/$GRANT_ID/rule-evaluations?limit=50" \
  --header "Authorization: Bearer $NYLAS_API_KEY"
```

Put the pieces together and a "what did the agent do?" investigation has a fixed shape. Say a customer complains the agent sent them something strange yesterday:

`received_after`

narrowing to yesterday. You now have every candidate message — not what the framework logged, what actually left the mailbox.`thread_id`

. Now you can see what the customer wrote that triggered the reply, and whether the agent's response was reasonable in context.`message_id`

— did it deliver, fail, or bounce?Four lookups, one grant ID, no archaeology in application logs. The same shape works in reverse for "why didn't the agent reply to this customer?" — check the inbox, check rule evaluations for a block, check whether `message.created`

ever fired.

The sent folder records what happened; the `drafts`

folder can record what *almost* happened. Drafts support full CRUD on `/v3/grants/{grant_id}/drafts`

, and sending an existing draft (`POST /drafts/{draft_id}`

) behaves exactly like a direct send. Run the agent in draft-first mode and you get a reviewable record of every proposed message before delivery — which makes the approval step itself part of the audit trail: who reviewed, what changed between draft and send, what was rejected outright.

There's also a low-tech supervision channel: each grant can carry an `app_password`

for IMAP and SMTP, so a human can connect Outlook or Apple Mail to the agent's mailbox and skim it like any other inbox. Protocol traffic and API traffic land in the same mailbox, so the reviewer in a mail client sees exactly what the API sees.

Honest limits, so you don't design around capabilities that aren't there:

`message.opened`

and `message.link_clicked`

aren't emitted for messages sent directly through `POST /messages/send`

on an Agent Account. You'll know a message was accepted or bounced, not whether a human read it.`message_id`

so the two records join.Also remember the practical ceiling: outbound messages are capped at 40 MB, and recipient servers often enforce lower limits around 25 MB — a `send_failed`

near those sizes is probably the payload, not the agent.

The cheap version of agent observability: subscribe to the three send-outcome triggers, store payloads keyed by `grant_id`

, and schedule a weekly skim of each agent's sent folder. Fifteen minutes of reading what your agent actually wrote will teach you more about its failure modes than any eval suite.

If you run an email agent today, here's the test: can you produce, in under five minutes, every message it sent yesterday and the thread context around each one? If not, wiring up the sent-folder query above is the fastest observability win available to you.
