Support Threads That Span Days: Agent Memory via Email

wpnews.pro

Most conversational state management assumes the conversation is happening — a chat session, a websocket, a context window. Email breaks that assumption rudely: a customer replies five days after your agent's last message, and your code is expected to pick up exactly where things left off, with no session, no socket, and a process that has restarted twelve times since.

The good news: email already solved durable conversation tracking, decades ago, in three headers. Build on them properly and the thread itself becomes the agent's memory. This is the pattern behind the multi-day support agent recipe, which runs an LLM support agent on its own mailbox via Agent Accounts — currently in beta.

Every email carries a globally unique Message-ID

. A reply adds In-Reply-To

(the Message-ID being answered) and References

(the full chain of Message-IDs, oldest to newest). That's how Gmail, Outlook, and Apple Mail all decide what belongs to one thread — subject-line matching is only a fallback, and the email threading docs explain why relying on it breaks: recipients edit subjects, two prospects can receive identical subjects, and forwards keep the subject while changing the conversation entirely.

You don't manage these headers yourself. Pass reply_to_message_id

on a send and the platform populates In-Reply-To

and References

automatically:

curl --request POST \
  --url "https://api.us.nylas.com/v3/grants/<GRANT_ID>/messages/send" \
  --header "Authorization: Bearer <NYLAS_API_KEY>" \
  --header "Content-Type: application/json" \
  --data '{
    "reply_to_message_id": "<MESSAGE_ID>",
    "to": [{ "email": "alice@example.com" }],
    "subject": "Re: Trouble accessing my account",
    "body": "Thanks for the extra detail, Alice — here is what I found..."
  }'

The reply threads correctly in the recipient's client and lands in the same thread in the agent's own mailbox.

When a reply arrives, the message.created

webhook payload includes thread_id

. That's the durable session identifier. The pattern from the docs:

thread_id

mapped to your internal state — ticket record, workflow step, whatever the agent was doing.thread_id

. Found it? Restore context and continue. Didn't? It's a brand-new conversation — classify and route it.The dispatch logic at the top of the webhook handler is short, but every line is a guard that earns its place:

app.post("/webhooks/support", async (req, res) => {
  res.status(200).end();

  const event = req.body;
  if (event.type !== "message.created") return;

  const msg = event.data.object;
  if (msg.grant_id !== SUPPORT_GRANT_ID) return;

  // Skip messages the agent itself sent.
  if (msg.from?.[0]?.email === SUPPORT_EMAIL) return;

  // Deduplicate — webhook delivery is at-least-once.
  if (await db.alreadyProcessed(msg.id)) return;
  await db.markProcessed(msg.id);

  // Reply to an existing ticket, or a brand-new conversation?
  const ticket = await db.tickets.findByThreadId(msg.thread_id);
  ticket ? await handleFollowUp(msg, ticket) : await handleNewTicket(msg);
});

The self-sent check matters more than it looks: the agent's own replies land in the same mailbox and fire the same trigger. Without that guard, the agent treats its own answer as a customer follow-up and responds to it — an email-based feedback loop you'll discover via a very confused customer.

The lookup table must live in a database, not memory. Support threads span days; in-memory maps don't survive deploys.

When a dormant thread revives, the agent re-reads the whole conversation through the Threads API: each thread object carries an ordered message_ids

list, the participants, and last-activity timestamps. Fetch the messages, sort by date, label each as agent or customer, and feed the transcript to the LLM for reclassification — not just reply generation. The recipe is insistent on this: a conversation that opened as a general question often turns into a billing dispute by message two, and routing should adapt.

The recipe also hard-codes lifecycle guards around the LLM:

A support agent that confidently sends a wrong billing answer is worse than one that says "let me get a human."

When the agent does hand off, it should pass the human everything it knows — the ticket category, the turn count, the escalation reason, and a pointer to the thread — so the human doesn't re-read the conversation from scratch. The recipe marks the ticket escalated

in the store, and the follow-up handler checks that status first: once a human owns a thread, the agent stays out of it.

The handoff mechanism itself is pleasingly low-tech. Because an Agent Account is a real mailbox, the human team can connect to it over IMAP from Outlook or Apple Mail and read or answer the escalated thread directly. The API and IMAP share the same mailbox, so if the ticket is later de-escalated, the human's replies are right there in the thread history the agent rehydrates.

Two operational numbers from the recipe worth tracking from day one:

And log everything: the classification result, the confidence score, and the generated reply for every interaction. Support emails are auditable communications; don't ship an agent that talks to customers without an audit trail.

Mostly you won't — thread_id

is more stable for application logic than Message-IDs, since the platform assigns it and it spans the whole conversation. But when you need the chain itself, pass fields=include_basic_headers

on a message GET to receive just Message-ID

, In-Reply-To

, and References

without the full header payload, which is often larger than the message body.

One more design note: don't assume one reply per outbound. Two people on a thread can both respond, and your agent shouldn't double-reply to the same thread because two webhooks fired.

Build the minimal loop — send, store thread_id

, reply to yourself from a personal account, watch the webhook restore context — then kill your process between the send and the reply. If the agent still picks up the conversation after a restart, your memory model is real. If it doesn't, you've found the bug before a customer did. How long do conversations in your domain go quiet before they come back?

source & further reading

dev.to — original article Auto-Generating an Index of Your Claude Code Custom Agents from Their Frontmatter Operable Over Sophisticated: What Shipping AI Agents at Scale Actually Looks Like Why Replit's AI Agent Deleted a Production Database

Support Threads That Span Days: Agent Memory via Email

Run your AI side-project on zahid.host