Build the Reply Loop: Receive, Think, Respond

A developer building an email agent using Nylas Agent Accounts details the receive-think-respond loop, highlighting edge cases such as the `message.created.truncated` webhook for bodies over 1 MB and the need to avoid replying to the agent's own outbound messages. The implementation uses a state machine keyed by thread_id to route replies, with the LLM returning a nextStep value to advance the conversation.

About 1 MB. That's the body-size threshold where the message.created webhook quietly changes shape — the trigger becomes message.created.truncated and the body is omitted entirely. If your email agent reads bodies straight off webhook payloads, it works fine for months and then silently drops the one reply that contained a forwarded contract. That detail is a good preview of this whole topic: the receive-think-respond loop is conceptually simple, and every interesting bug lives in the edges. Let's wire the loop properly, using a Nylas Agent Account https://developer.nylas.com/docs/v3/agent-accounts/ in beta as the agent's mailbox. A message.created webhook fires when mail arrives. Treat it as notification only: js app.post "/webhooks/nylas", async req, res = { res.status 200 .end ; // ack fast, work async const event = req.body; if event.type == "message.created" return; const msg = event.data.object; if msg.grant id == AGENT GRANT ID return; // Outbound fires message.created too -- don't reply to yourself. if msg.from?. 0 ?.email === AGENT EMAIL return; const conversation = await db.conversations.findByThreadId msg.thread id ; conversation ? await continueConversation msg, conversation : await triageNewInbound msg ; } ; Three load-bearing lines in there. The grant check keeps other accounts' traffic out. The from check matters because the webhook fires for outbound mail too — skip it and your agent replies to its own replies, forever. And the thread id lookup is how a reply gets recognized as a reply: messages are grouped into threads using the In-Reply-To and References headers, so if your agent sent the original message, the inbound reply lands on a thread you already have state for. No header parsing on your side. The payload carries summary fields — subject , from , snippet . Before the model decides anything, fetch the real data: js const fullMessage = await nylas.messages.find { identifier: AGENT GRANT ID, messageId: msg.id, } ; const thread = await nylas.threads.find { identifier: AGENT GRANT ID, threadId: msg.thread id, } ; // thread.data.messageIds - fetch each, sort by date, build transcript An LLM answering "sounds good, let's do Thursday" needs to know what was proposed — the full thread is the conversation memory. For long threads, you don't need every message verbatim: summarize the early turns and pass the last 3–4 in full. Same context, fraction of the tokens. Your own state machine supplies the other half of the context. A conversation record keyed by thread id tracks a step field, and the handler routes on it before any model call happens: async function routeReply message, history, context { switch context.step { case "awaiting confirmation": // The agent proposed something and is waiting for a yes/no. await handleConfirmation message, history, context ; break; case "awaiting info": // The agent asked a question and needs the answer. await handleInfoResponse message, history, context ; break; case "closed": // The conversation was resolved but the person wrote back. await handleReopenedThread message, history, context ; break; default: // Unknown state -- log and escalate. await escalateToHuman message, context ; } } A "yes" means something different depending on what the agent asked, and the default branch matters: an unknown state should escalate, not improvise. The other useful trick from the multi-turn recipe https://developer.nylas.com/docs/cookbook/agent-accounts/multi-turn-conversations/ : have the LLM return a nextStep value along with the reply text, so the model itself advances the state machine instead of your code guessing where the conversation went. js const sent = await nylas.messages.send { identifier: AGENT GRANT ID, requestBody: { replyToMessageId: msg.id, to: fullMessage.data.from, subject: Re: ${fullMessage.data.subject} , body: replyBody, }, } ; Passing reply to message id makes the platform set In-Reply-To and References on the outbound message, so the recipient's mail client renders a threaded reply instead of a disconnected new email. Skip it and every reply starts a new thread — the fastest way to make an agent feel broken to the human on the other end. The mechanics are covered in depth in the handle-replies recipe https://developer.nylas.com/docs/cookbook/agent-accounts/handle-replies/ . After sending, update the conversation record: bump the turn count, set the next step , stamp lastActivityAt . Self-reply loops. Covered above, but it's the 1 footgun. One missing from check equals an infinite conversation with yourself. Duplicate replies. Webhook redelivery and concurrent workers will both re-trigger your handler — at any volume, not just at scale. Without dedup and locking, the same inbound message generates two LLM calls and two replies. Treat idempotency as a launch requirement, not a hardening task. Rapid-fire corrections. Humans send "let's do Thursday" and then "actually, Friday" eleven seconds apart. A 30–60 second cooldown before responding lets you batch consecutive inbound messages into one coherent reply instead of answering each individually. Runaway conversations. An unbounded loop is a token sink and a risk. The multi-turn recipe https://developer.nylas.com/docs/cookbook/agent-accounts/multi-turn-conversations/ bakes a maxTurns cap into the conversation record — 10 is a reasonable default — and escalates to a human when it's hit. Zombie threads. Someone replies to a conversation that went quiet weeks ago. Decide the behavior up front; a sane rule is escalating anything dormant past 168 hours one week rather than letting the agent auto-resume with stale context. Multiple repliers on one thread. CC someone and you've invited a second voice into the conversation — two people might both reply to the same agent message. Process each inbound independently, and check whether the agent has already responded since the last inbound before generating another reply. Lost state. The gap between turns can be days, so conversation records live in Postgres, Redis with AOF, DynamoDB — anything that survives restarts. In-memory state means every deploy lobotomizes your agent mid-conversation. Not every conversation ends with the agent's final word, and the exits deserve code too. Escalation is a state change plus a notification: async function escalate conversation, reason { await db.conversations.update conversation.threadId, { step: "escalated", metadata: { ...conversation.metadata, escalationReason: reason }, } ; await notifyHumanOperator { threadId: conversation.threadId, contact: conversation.contactEmail, reason, } ; } Completion is the same move with step: "completed" — and it's not just bookkeeping. When the prospect books the meeting or the support question gets answered, marking the record done changes how the next inbound on that thread routes: it hits the closed branch of your router instead of generating an out-of-context continuation. The state machine's exits are what make its middle states trustworthy. One last note on the front door: verify the X-Nylas-Signature header before your handler does anything. An unverified webhook endpoint is an API that lets anyone on the internet make your agent send email. Build the loop in this order: webhook handler with the three guard checks → thread fetching → a hardcoded reply no LLM yet → verify threading works in a real mail client → then swap in the model. Wiring the LLM first is the classic mistake; you end up debugging prompt quality and webhook delivery simultaneously. Which failure mode bit you first? Mine's universal enough that I'll guess: the agent replied to itself.