The most effective safety control for an email agent isn't a better model, a longer system prompt, or a stricter eval suite. It's a draft folder.
Here's the setup. Nylas Agent Accounts β currently in beta β are hosted mailboxes your application creates and controls entirely through the API. Each one is a real address with a grant_id
that works against the existing Messages, Drafts, Threads, and Folders endpoints, and each mailbox ships with six system folders: inbox
, sent
, drafts
, trash
, junk
, and archive
. That drafts
folder is where your approval workflow lives.
A common pattern for support mailboxes: an LLM drafts replies to common questions, and humans approve the sensitive ones via a webhook flow. The agent handles the boring 80% on its own β password reset instructions, shipping status, "where's the invoice" β and anything touching refunds, legal language, or an angry customer goes through a person first.
The threat you're mitigating is mundane: a model that's confidently wrong. Hallucinated discounts, replies to the wrong thread, a tone-deaf response to a complaint. None of these are exotic attacks. They're the everyday failure modes of putting a probabilistic system on an outbound channel, and the mitigation is to put a deterministic gate between "the model wrote something" and "a customer received it."
The flow: a message.created
webhook fires when mail arrives, your classifier decides the risk level, and high-risk replies become drafts instead of sends. Drafts support full CRUD at /v3/grants/{grant_id}/drafts
, so the agent creates one like this:
curl --request POST \
--url "https://api.us.nylas.com/v3/grants/$GRANT_ID/drafts" \
--header "Authorization: Bearer $NYLAS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"subject": "Re: Refund request for order 4821",
"body": "Hi Sam, I have processed the refund...",
"to": [{ "email": "sam@example.com" }],
"reply_to_message_id": "<INBOUND_MESSAGE_ID>"
}'
Nothing leaves the mailbox yet. The draft sits in the agent's drafts folder until your approval UI (or a Slack button, or a daily review queue) signs off. Approval is a single POST to the draft itself β sending an existing draft behaves exactly like POST /messages/send
:
curl --request POST \
--url "https://api.us.nylas.com/v3/grants/$GRANT_ID/drafts/$DRAFT_ID" \
--header "Authorization: Bearer $NYLAS_API_KEY"
Rejection is just as clean: update the draft with edits, or delete it. Because reply_to_message_id
was set at draft time, the approved reply threads correctly in the recipient's client with no extra work.
One detail that makes this pattern nicer in practice: an Agent Account grant can carry an app_password
for IMAP and SMTP access. That means a reviewer can connect Outlook or Apple Mail directly to the agent's mailbox and read pending drafts in a normal mail client β no custom review dashboard required for v1. The mailboxes guide covers how API traffic and mail-client traffic share the same underlying mailbox: anything sent via the API shows up in the client's sent folder, and vice versa.
Don't make this binary. A useful split:
/messages/send
.The scoping principle from the agent security guide applies directly: an agent that drafts replies for review only needs the ability to create drafts β a person hits send. You can enforce that boundary in your agent's own code paths rather than trusting the prompt to behave.
Size the human side honestly, too. The send cap is 200 messages per account per day on the free plan, which sounds like a lot until you realize a reviewer approving even a quarter of that volume is doing 50 reviews daily. If your queue grows past what a human can clear, that's a signal to tighten the classifier β promote more template families to auto-send β rather than rubber-stamp faster.
The draft gate lives in your application code, which means a bug in your application code can bypass it. A misrouted code path that calls /messages/send
directly skips the queue entirely, and the model never knows the difference. Defense in depth here is an outbound rule β a server-side check Nylas evaluates before any send reaches the email provider, regardless of which code path issued it:
curl --request POST \
--url "https://api.us.nylas.com/v3/rules" \
--header "Authorization: Bearer $NYLAS_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"name": "Block outbound to high-risk domains",
"trigger": "outbound",
"match": {
"conditions": [
{ "field": "recipient.domain", "operator": "in_list", "value": ["<LIST_ID>"] }
]
},
"actions": [{ "type": "block" }]
}'
Attach the rule to the agent's workspace via its rule_ids
array and every Agent Account in that workspace inherits it. A blocked send returns 403
to the caller and no sent copy is stored β the message never existed as far as the recipient is concerned. The recipient.domain
condition matches any recipient including BCC and SMTP envelope recipients, so a prompt-injected "also BCC this address" doesn't slip past it. And every evaluation is logged: GET /v3/grants/{grant_id}/rule-evaluations
shows which rule matched, at which stage, and what action was applied, which is exactly what you want when someone asks why a send failed at 2 a.m.
You can also split rules by outbound.type
, which is reply
when the send carries reply_to_message_id
(or In-Reply-To
/References
headers) and compose
for brand-new messages. A reasonable posture: let approved replies flow, but treat any compose
from the agent as suspicious β agents in a reply loop shouldn't be starting new conversations.
Approval isn't the end of the message's life. After the reviewer sends the draft, Nylas reports what happened on the wire through three webhook triggers: message.send_success
when the recipient's server accepts the message, message.send_failed
when the send dies before reaching the recipient, and message.bounce_detected
for hard and soft bounces. Wire these into the same approval UI β a reviewer who approved a reply that then bounced should see that, because the right next action (correct the address, escalate to a human channel) is also a review decision.
One payload detail worth handling up front: if an inbound message body exceeds roughly 1 MB, the webhook arrives as message.created.truncated
with the body omitted. Your classifier should detect that case and fetch the full message with GET /messages/{id}
before deciding the risk level β classifying a truncated payload means classifying on the subject line alone.
Two things bite teams building this:
The quickstart gets you from API key to a working mailbox in under 5 minutes, and the drafts endpoints work immediately on any account you create. Start with everything routed through the draft gate, measure how often the human actually edits the model's output, and loosen from there.
What's your edit rate? If reviewers are approving 95% of drafts untouched, I'd love to hear in the comments how you decided which categories were safe to automate.