Multi-Turn Email Conversations for LLM Agents

wpnews.pro

Day 0, 10:00 — your agent sends a demo follow-up. Day 2, 14:37 — the prospect replies with a question. Day 2, 14:39 — they send a second thought. Day 5 — silence, then a reply to something the agent said a week ago. Somewhere between day 0 and day 5, your process restarted twice and deployed once.

A single send-and-forget email is easy. The timeline above is the actual job: a conversation spanning five exchanges over days, where the agent has to remember what it said, what it's waiting for, and where in the workflow it stands — across restarts, deploys, and hours of dead air. The multi-turn conversation recipe builds this loop on a Nylas Agent Account (the feature's in beta), running entirely on webhooks and the Threads API — no polling, no missed messages.

The core design decision: every active conversation gets a durable record keyed by the thread ID.

const conversationRecord = {
  threadId: "nylas-thread-id",
  grantId: AGENT_GRANT_ID,
  contactEmail: "prospect@example.com",
  purpose: "demo_followup",   // What started this conversation
  step: "awaiting_reply",     // Where in the workflow we are
  turnCount: 1,
  maxTurns: 10,               // Safety cap before escalation
  lastActivityAt: "2026-04-14T10:00:00Z",
  metadata: {},
};

The step

field is the heart of it — a tiny state machine tracking what the agent is waiting for, which determines how the next inbound message gets handled. The store has to be durable (Postgres, Redis with AOF, DynamoDB); the gap between messages can be days, so in-memory state is a non-starter.

Starting a conversation means sending the first message and persisting the record under the threadId

the send returns:

async function startConversation({ to, subject, body, purpose, metadata }) {
  const sent = await nylas.messages.send({
    identifier: AGENT_GRANT_ID,
    requestBody: {
      to: [{ email: to.email, name: to.name }],
      subject,
      body,
    },
  });

  await db.conversations.create({
    threadId: sent.data.threadId,
    contactEmail: to.email,
    purpose,
    step: "awaiting_reply",
    turnCount: 1,
    maxTurns: 10,
    lastActivityAt: new Date().toISOString(),
    metadata: metadata ?? {},
  });

  return sent.data;
}

Email threading does the heavy lifting from there: every future reply arrives carrying the same thread_id

, which is your lookup key back into the agent's memory.

When message.created

fires, the handler runs three checks before any LLM gets involved:

const msg = event.data.object;
if (msg.grant_id !== AGENT_GRANT_ID) return;

// Outbound fires message.created too — don't reply to yourself.
if (msg.from?.[0]?.email === agentEmail) return;

const conversation = await db.conversations.findByThreadId(msg.thread_id);
if (!conversation) {
  await triageNewInbound(msg);  // Not a reply to anything we sent.
  return;
}

That middle check is the classic footgun: message.created

fires for the agent's own sends. Skip the sender check and the agent enters a polite infinite loop with itself.

The webhook payload only carries summary fields, so the handler fetches the full message, then pulls the entire thread and every message in it, sorts by date, and formats a transcript with agent

/ contact

roles. The LLM gets the transcript plus the current step

and purpose

, generates the reply, and returns a nextStep

that advances the state machine. The reply goes out with replyToMessageId

set so it threads correctly on the recipient's side, and the record updates: increment turnCount

, bump lastActivityAt

, merge any new metadata

.

One efficiency note from the recipe that pays for itself fast: the model doesn't need every message. For long threads, summarize the early turns and pass only the last 3–4 messages in full. Token usage stays sane without losing the context that matters.

The recipe treats lifecycle edges as first-class features, not error handling:

turnCount

against maxTurns

. An unbounded loop is a token sink and a risk — 10 turns is the recipe's default, tuned to whatever's realistic for the workflow.step

to escalated

, record the reason, and notify a human through whatever you use — Slack, PagerDuty, an internal API.completed

so a later reply on the same thread doesn't reanimate the workflow.The dormancy check is four lines in the webhook handler, before the conversation continues:

const hoursSinceLastActivity =
  (Date.now() - new Date(conversation.lastActivityAt).getTime()) / 3600000;

if (hoursSinceLastActivity > 168) {
  await escalate(conversation, "dormant thread reopened after 7+ days");
  return;
}

Escalation itself is just a state transition plus a notification — set step: "escalated"

, store the reason in metadata

, ping the human channel. The thread stays intact, so whoever picks it up reads the same transcript the agent had.

Two more behaviors that separate demos from production: batch rapid-fire replies (a 30–60 second delay turns two quick messages into one turn instead of two separate generated replies), and treat webhook redelivery plus concurrent workers as a day-one concern — dedup and locking, not an edge case for later.

Chat sessions evaporate when the tab closes. An email thread is durable, human-readable, and auditable — the conversation state machine on top of it can crash, redeploy, and resume, because the source of truth (the thread) and the workflow position (the record) both survive. That's a genuinely good persistence model for any agent whose counterpart is a human on their own schedule.

A focused way to start: implement just startConversation

and the webhook handler with the three filters, hard-code one purpose

, and run a single conversation with yourself across two days — including one process restart in the middle. If the agent picks the thread back up correctly, the rest is iteration. What's the longest-running conversation you'd trust an agent to hold?

source & further reading

dev.to — original article How to Check If AI Systems Can Find and Cite Your Site (in 5 Minutes) AI agents changed my opinion on vertical slices. I stopped writing rules for coding agents that CI could not enforce

Multi-Turn Email Conversations for LLM Agents

Run your AI side-project on zahid.host