Here's how the story usually goes. Saturday afternoon, you wire a language model to a mailbox for the first time. You type "summarize my unread mail" and watch it actually happen — the model scans, picks out the thread from your landlord, nails the summary. Magic. Sunday morning, drunk on possibility, you add a send capability. Sunday evening, you're reading a transcript where a newsletter's footer text nearly convinced the model to forward something it shouldn't, and you quietly remove the send tool until you understand what just happened.
The gap between Saturday and Sunday is the actual engineering of an AI email assistant. The model can't touch a mailbox on its own — you give it tools: small server-side functions that wrap email endpoints, run when the model asks, and hand results back. The model decides; your code acts. Getting that boundary right is the whole game.
The pattern works identically for ChatGPT, Claude, or any model with function calling — a tool is a JSON schema with a name
, description
, and typed parameters
. Define three: list_messages
, get_message
, send_email
. The descriptions are what the model reasons over, so write them like instructions, and keep parameter counts low — models pick correctly from 3 to 5 fields far more reliably than from 15.
{
"name": "send_email",
"description": "Send an email from the user's mailbox. Requires human approval first.",
"parameters": {
"type": "object",
"properties": {
"to": { "type": "string", "description": "Recipient email address" },
"subject": { "type": "string" },
"body": { "type": "string", "description": "HTML or plain text body" }
},
"required": ["to", "subject", "body"]
}
}
All three tools map to two endpoints: list and get both hit GET /v3/grants/{grant_id}/messages
, send hits POST /v3/grants/{grant_id}/messages/send
. One dispatcher handles the lot:
def run_tool(name, args, grant_id):
base = f"{NYLAS_API}/grants/{grant_id}/messages"
if name == "list_messages":
params = {"limit": min(args.get("limit", 50), 200)}
if args.get("unread"):
params["unread"] = "true"
return requests.get(base, headers=HEADERS, params=params).json()
if name == "get_message":
return requests.get(f"{base}/{args['message_id']}", headers=HEADERS).json()
if name == "send_email":
if not args.get("approved"): # human-in-the-loop gate
return {"status": "pending_approval"}
payload = {"to": [{"email": args["to"]}],
"subject": args["subject"], "body": args["body"]}
return requests.post(f"{base}/send", headers=HEADERS, json=payload).json()
The grant_id
identifies whose mailbox you're operating — a connected Gmail or Outlook account, or an Agent Account (a hosted mailbox the assistant owns outright, currently in beta) if you'd rather the bot have its own address. Same endpoints either way; sends work across 6 providers — Google, Microsoft, Yahoo, iCloud, IMAP, and EWS — with zero SMTP setup.
Token cost scales with what you feed the model, and raw API responses are bloated for this purpose — a list response carries dozens of fields per message. Triage needs four:
def slim(message):
return {
"id": message["id"],
"from": message["from"][0]["email"],
"subject": message["subject"],
"snippet": message.get("snippet", "")[:200],
}
Trimming a 50-message list this way cuts the payload by about 80% versus full message objects. The flow becomes: list (slim) → model picks the IDs that matter → get_message
for those few full bodies → summarize. List returns 50 messages by default with a 200 maximum, so cap the limit and never dump a 200-message inbox into one prompt.
Trace "summarize my unread mail and flag anything urgent" through the machinery:
list_messages
with {"unread": true, "limit": 50}
.GET /v3/grants/{grant_id}/messages
, slims each result to four fields, and returns the trimmed list as the tool output.get_message
calls.send_email
... and gets {"status": "pending_approval"}
back, because nothing leaves without a human click.Two details to notice. The model never saw an API key, a raw header, or a message it didn't ask for. And the expensive step — full bodies — happened for 3 messages, not 50. That's the shape of every well-built turn: broad and cheap, then narrow and complete.
When the human does approve, the confirmation is just the same tool call with the gate flag set:
draft = {"to": "ada@example.com",
"subject": "Re: Q2 plan",
"body": "Thanks Ada, 9am PT works. I'll send an invite."}
draft["approved"] = True
run_tool("send_email", draft, grant_id)
Back to that send tool. Four practices cover the failure modes that cause real incidents:
pending_approval
until a person sees the full draft and signs off. This one gate neutralizes both hallucinated sends and injected ones, at the cost of one click — and one wrong send costs far more than that click.The complete recipe — full dispatcher, both provider wrappings, the security checklist — is in the ChatGPT email plugin guide. When you outgrow single-turn chat, the email triage agent runs the same tools on a cron, and inbox zero with an agent keeps a human approving every action.
Next step: implement just list_messages
and get_message
tonight — read-only, no send tool at all — and ask the model to triage your real inbox. You'll learn more from twenty minutes of watching its tool calls than from any post about it, this one included.