# Build the Reply Loop: Receive, Think, Respond

> Source: <https://dev.to/qasim157/build-the-reply-loop-receive-think-respond-2me7>
> Published: 2026-06-16 17:05:58+00:00

About 1 MB. That's the body-size threshold where the `message.created`

webhook quietly changes shape — the trigger becomes `message.created.truncated`

and the body is omitted entirely. If your email agent reads bodies straight off webhook payloads, it works fine for months and then silently drops the one reply that contained a forwarded contract. That detail is a good preview of this whole topic: the receive-think-respond loop is conceptually simple, and every interesting bug lives in the edges.

Let's wire the loop properly, using a [Nylas Agent Account](https://developer.nylas.com/docs/v3/agent-accounts/) (in beta) as the agent's mailbox.

A `message.created`

webhook fires when mail arrives. Treat it as notification only:

``` js
app.post("/webhooks/nylas", async (req, res) => {
  res.status(200).end(); // ack fast, work async

  const event = req.body;
  if (event.type !== "message.created") return;

  const msg = event.data.object;
  if (msg.grant_id !== AGENT_GRANT_ID) return;

  // Outbound fires message.created too -- don't reply to yourself.
  if (msg.from?.[0]?.email === AGENT_EMAIL) return;

  const conversation = await db.conversations.findByThreadId(msg.thread_id);
  conversation
    ? await continueConversation(msg, conversation)
    : await triageNewInbound(msg);
});
```

Three load-bearing lines in there. The grant check keeps other accounts' traffic out. The `from`

check matters because **the webhook fires for outbound mail too** — skip it and your agent replies to its own replies, forever. And the `thread_id`

lookup is how a reply gets recognized as a reply: messages are grouped into threads using the `In-Reply-To`

and `References`

headers, so if your agent sent the original message, the inbound reply lands on a thread you already have state for. No header parsing on your side.

The payload carries summary fields — `subject`

, `from`

, `snippet`

. Before the model decides anything, fetch the real data:

``` js
const fullMessage = await nylas.messages.find({
  identifier: AGENT_GRANT_ID,
  messageId: msg.id,
});

const thread = await nylas.threads.find({
  identifier: AGENT_GRANT_ID,
  threadId: msg.thread_id,
});
// thread.data.messageIds -> fetch each, sort by date, build transcript
```

An LLM answering "sounds good, let's do Thursday" needs to know what was proposed — the full thread is the conversation memory. For long threads, you don't need every message verbatim: summarize the early turns and pass the last 3–4 in full. Same context, fraction of the tokens.

Your own state machine supplies the other half of the context. A conversation record keyed by `thread_id`

tracks a `step`

field, and the handler routes on it before any model call happens:

```
async function routeReply(message, history, context) {
  switch (context.step) {
    case "awaiting_confirmation":
      // The agent proposed something and is waiting for a yes/no.
      await handleConfirmation(message, history, context);
      break;
    case "awaiting_info":
      // The agent asked a question and needs the answer.
      await handleInfoResponse(message, history, context);
      break;
    case "closed":
      // The conversation was resolved but the person wrote back.
      await handleReopenedThread(message, history, context);
      break;
    default:
      // Unknown state -- log and escalate.
      await escalateToHuman(message, context);
  }
}
```

A "yes" means something different depending on what the agent asked, and the `default`

branch matters: an unknown state should escalate, not improvise. The other useful trick from the [multi-turn recipe](https://developer.nylas.com/docs/cookbook/agent-accounts/multi-turn-conversations/): have the LLM return a `nextStep`

value along with the reply text, so the model itself advances the state machine instead of your code guessing where the conversation went.

``` js
const sent = await nylas.messages.send({
  identifier: AGENT_GRANT_ID,
  requestBody: {
    replyToMessageId: msg.id,
    to: fullMessage.data.from,
    subject: `Re: ${fullMessage.data.subject}`,
    body: replyBody,
  },
});
```

Passing `reply_to_message_id`

makes the platform set `In-Reply-To`

and `References`

on the outbound message, so the recipient's mail client renders a threaded reply instead of a disconnected new email. Skip it and every reply starts a new thread — the fastest way to make an agent feel broken to the human on the other end. The mechanics are covered in depth in the [handle-replies recipe](https://developer.nylas.com/docs/cookbook/agent-accounts/handle-replies/).

After sending, update the conversation record: bump the turn count, set the next `step`

, stamp `lastActivityAt`

.

**Self-reply loops.** Covered above, but it's the #1 footgun. One missing `from`

check equals an infinite conversation with yourself.

**Duplicate replies.** Webhook redelivery and concurrent workers will both re-trigger your handler — at any volume, not just at scale. Without dedup and locking, the same inbound message generates two LLM calls and two replies. Treat idempotency as a launch requirement, not a hardening task.

**Rapid-fire corrections.** Humans send "let's do Thursday" and then "actually, Friday" eleven seconds apart. A 30–60 second cooldown before responding lets you batch consecutive inbound messages into one coherent reply instead of answering each individually.

**Runaway conversations.** An unbounded loop is a token sink and a risk. The [multi-turn recipe](https://developer.nylas.com/docs/cookbook/agent-accounts/multi-turn-conversations/) bakes a `maxTurns`

cap into the conversation record — 10 is a reasonable default — and escalates to a human when it's hit.

**Zombie threads.** Someone replies to a conversation that went quiet weeks ago. Decide the behavior up front; a sane rule is escalating anything dormant past 168 hours (one week) rather than letting the agent auto-resume with stale context.

**Multiple repliers on one thread.** CC someone and you've invited a second voice into the conversation — two people might both reply to the same agent message. Process each inbound independently, and check whether the agent has already responded since the last inbound before generating another reply.

**Lost state.** The gap between turns can be days, so conversation records live in Postgres, Redis with AOF, DynamoDB — anything that survives restarts. In-memory state means every deploy lobotomizes your agent mid-conversation.

Not every conversation ends with the agent's final word, and the exits deserve code too. Escalation is a state change plus a notification:

```
async function escalate(conversation, reason) {
  await db.conversations.update(conversation.threadId, {
    step: "escalated",
    metadata: { ...conversation.metadata, escalationReason: reason },
  });
  await notifyHumanOperator({
    threadId: conversation.threadId,
    contact: conversation.contactEmail,
    reason,
  });
}
```

Completion is the same move with `step: "completed"`

— and it's not just bookkeeping. When the prospect books the meeting or the support question gets answered, marking the record done changes how the *next* inbound on that thread routes: it hits the `closed`

branch of your router instead of generating an out-of-context continuation. The state machine's exits are what make its middle states trustworthy.

One last note on the front door: verify the `X-Nylas-Signature`

header before your handler does anything. An unverified webhook endpoint is an API that lets anyone on the internet make your agent send email.

Build the loop in this order: webhook handler with the three guard checks → thread fetching → a hardcoded reply (no LLM yet) → verify threading works in a real mail client → then swap in the model. Wiring the LLM first is the classic mistake; you end up debugging prompt quality and webhook delivery simultaneously.

Which failure mode bit you first? Mine's universal enough that I'll guess: the agent replied to itself.