{"slug": "multi-turn-email-conversations-for-llm-agents", "title": "Multi-Turn Email Conversations for LLM Agents", "summary": "A Nylas developer has built a multi-turn email conversation system for LLM agents that maintains state across days, restarts, and deploys using a durable record keyed by thread ID. The system runs entirely on webhooks and the Threads API, with a state machine tracking the agent's step in the workflow to determine how each inbound message is handled. The implementation includes a sender-check to prevent the agent from replying to its own outbound messages, avoiding infinite loops.", "body_md": "Day 0, 10:00 — your agent sends a demo follow-up. Day 2, 14:37 — the prospect replies with a question. Day 2, 14:39 — they send a second thought. Day 5 — silence, then a reply to something the agent said a week ago. Somewhere between day 0 and day 5, your process restarted twice and deployed once.\n\nA single send-and-forget email is easy. The timeline above is the actual job: a conversation spanning five exchanges over days, where the agent has to remember what it said, what it's waiting for, and where in the workflow it stands — across restarts, deploys, and hours of dead air. The [multi-turn conversation recipe](https://developer.nylas.com/docs/cookbook/agent-accounts/multi-turn-conversations/) builds this loop on a Nylas Agent Account (the feature's in beta), running entirely on webhooks and the Threads API — no polling, no missed messages.\n\nThe core design decision: every active conversation gets a durable record keyed by the thread ID.\n\n``` js\nconst conversationRecord = {\n  threadId: \"nylas-thread-id\",\n  grantId: AGENT_GRANT_ID,\n  contactEmail: \"prospect@example.com\",\n  purpose: \"demo_followup\",   // What started this conversation\n  step: \"awaiting_reply\",     // Where in the workflow we are\n  turnCount: 1,\n  maxTurns: 10,               // Safety cap before escalation\n  lastActivityAt: \"2026-04-14T10:00:00Z\",\n  metadata: {},\n};\n```\n\nThe `step`\n\nfield is the heart of it — a tiny state machine tracking what the agent is waiting for, which determines how the next inbound message gets handled. The store has to be durable (Postgres, Redis with AOF, DynamoDB); the gap between messages can be days, so in-memory state is a non-starter.\n\nStarting a conversation means sending the first message and persisting the record under the `threadId`\n\nthe send returns:\n\n```\nasync function startConversation({ to, subject, body, purpose, metadata }) {\n  const sent = await nylas.messages.send({\n    identifier: AGENT_GRANT_ID,\n    requestBody: {\n      to: [{ email: to.email, name: to.name }],\n      subject,\n      body,\n    },\n  });\n\n  await db.conversations.create({\n    threadId: sent.data.threadId,\n    contactEmail: to.email,\n    purpose,\n    step: \"awaiting_reply\",\n    turnCount: 1,\n    maxTurns: 10,\n    lastActivityAt: new Date().toISOString(),\n    metadata: metadata ?? {},\n  });\n\n  return sent.data;\n}\n```\n\nEmail threading does the heavy lifting from there: every future reply arrives carrying the same `thread_id`\n\n, which is your lookup key back into the agent's memory.\n\nWhen `message.created`\n\nfires, the handler runs three checks before any LLM gets involved:\n\n``` js\nconst msg = event.data.object;\nif (msg.grant_id !== AGENT_GRANT_ID) return;\n\n// Outbound fires message.created too — don't reply to yourself.\nif (msg.from?.[0]?.email === agentEmail) return;\n\nconst conversation = await db.conversations.findByThreadId(msg.thread_id);\nif (!conversation) {\n  await triageNewInbound(msg);  // Not a reply to anything we sent.\n  return;\n}\n```\n\nThat middle check is the classic footgun: `message.created`\n\nfires for the agent's *own* sends. Skip the sender check and the agent enters a polite infinite loop with itself.\n\nThe webhook payload only carries summary fields, so the handler fetches the full message, then pulls the entire thread and every message in it, sorts by date, and formats a transcript with `agent`\n\n/ `contact`\n\nroles. The LLM gets the transcript plus the current `step`\n\nand `purpose`\n\n, generates the reply, and returns a `nextStep`\n\nthat advances the state machine. The reply goes out with `replyToMessageId`\n\nset so it threads correctly on the recipient's side, and the record updates: increment `turnCount`\n\n, bump `lastActivityAt`\n\n, merge any new `metadata`\n\n.\n\nOne efficiency note from the recipe that pays for itself fast: the model doesn't need every message. For long threads, summarize the early turns and pass only the last 3–4 messages in full. Token usage stays sane without losing the context that matters.\n\nThe recipe treats lifecycle edges as first-class features, not error handling:\n\n`turnCount`\n\nagainst `maxTurns`\n\n. An unbounded loop is a token sink and a risk — 10 turns is the recipe's default, tuned to whatever's realistic for the workflow.`step`\n\nto `escalated`\n\n, record the reason, and notify a human through whatever you use — Slack, PagerDuty, an internal API.`completed`\n\nso a later reply on the same thread doesn't reanimate the workflow.The dormancy check is four lines in the webhook handler, before the conversation continues:\n\n``` js\nconst hoursSinceLastActivity =\n  (Date.now() - new Date(conversation.lastActivityAt).getTime()) / 3600000;\n\nif (hoursSinceLastActivity > 168) {\n  await escalate(conversation, \"dormant thread reopened after 7+ days\");\n  return;\n}\n```\n\nEscalation itself is just a state transition plus a notification — set `step: \"escalated\"`\n\n, store the reason in `metadata`\n\n, ping the human channel. The thread stays intact, so whoever picks it up reads the same transcript the agent had.\n\nTwo more behaviors that separate demos from production: batch rapid-fire replies (a 30–60 second delay turns two quick messages into one turn instead of two separate generated replies), and treat webhook redelivery plus concurrent workers as a day-one concern — dedup and locking, not an edge case for later.\n\nChat sessions evaporate when the tab closes. An email thread is durable, human-readable, and auditable — the conversation state machine on top of it can crash, redeploy, and resume, because the source of truth (the thread) and the workflow position (the record) both survive. That's a genuinely good persistence model for any agent whose counterpart is a human on their own schedule.\n\nA focused way to start: implement just `startConversation`\n\nand the webhook handler with the three filters, hard-code one `purpose`\n\n, and run a single conversation with yourself across two days — including one process restart in the middle. If the agent picks the thread back up correctly, the rest is iteration. What's the longest-running conversation you'd trust an agent to hold?", "url": "https://wpnews.pro/news/multi-turn-email-conversations-for-llm-agents", "canonical_source": "https://dev.to/qasim157/multi-turn-email-conversations-for-llm-agents-280j", "published_at": "2026-06-12 12:37:20+00:00", "updated_at": "2026-06-12 12:42:17.532800+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "ai-products", "ai-tools", "natural-language-processing"], "entities": ["Nylas", "Nylas Agent Account", "Threads API"], "alternates": {"html": "https://wpnews.pro/news/multi-turn-email-conversations-for-llm-agents", "markdown": "https://wpnews.pro/news/multi-turn-email-conversations-for-llm-agents.md", "text": "https://wpnews.pro/news/multi-turn-email-conversations-for-llm-agents.txt", "jsonld": "https://wpnews.pro/news/multi-turn-email-conversations-for-llm-agents.jsonld"}}