Scaling to Thousands of Agent Mailboxes

wpnews.pro

Week one: a single test mailbox on a trial domain, provisioned by hand from the dashboard. Week twelve: a fleet of agent mailboxes spread across customer domains, each sending real mail with its own quota and reputation. The API calls are the same at both scales — what changes is everything around them: how you provision, how you share configuration, and how you keep one bad sender from pausing the fleet.

Here's what the path from one to thousands looks like with Nylas Agent Accounts, which are currently in beta.

There's no OAuth dance to scale around. Creating a mailbox is one POST with "provider": "nylas"

— no refresh token, no consent screen — so a fleet provisioner is just iteration:

curl --request POST \
  --url "https://api.us.nylas.com/v3/connect/custom" \
  --header "Authorization: Bearer <NYLAS_API_KEY>" \
  --header "Content-Type: application/json" \
  --data '{
    "provider": "nylas",
    "workspace_id": "<WORKSPACE_ID>",
    "settings": {
      "email": "agent-0042@agents.yourcompany.com"
    }
  }'

Two scaling-relevant details in that request. First, the domain: one application can manage accounts across any number of registered domains, and the docs explicitly recommend splitting high-volume outbound across multiple domains (sales-a.yourcompany.com

, sales-b.yourcompany.com

) so reputation damage on one doesn't contaminate the rest. Second, the workspace_id

: passing it at creation is how each account picks up its configuration, which brings us to the part that makes fleets manageable.

At fleet scale, per-grant configuration is a non-starter. The model here is indirection: policies and rules attach to workspaces, and every account in a workspace inherits them. One policy object — daily send quota, storage cap, retention windows, spam sensitivity — governs a thousand mailboxes, and updating it updates all of them at once.

The recommended carve-up is one workspace per agent archetype: outreach agents get a workspace with high send quotas and strict outbound rules; triage agents get one with aggressive spam filtering and modest quotas. With auto_group

enabled, new accounts join the right workspace by email domain automatically, so your provisioner can't misfile them.

Allow/block lists scale the same way. A list holds domains, TLDs, or addresses; rules reference it through in_list

; and you can add up to 1,000 items per request. Update the list and every rule referencing it picks up the change immediately — no redeploys, and non-engineers can own the contents.

Send volume isn't the hard limit; deliverability is. The platform tracks each account's rolling bounce and complaint rates against recent send volume, and the thresholds are unforgiving at fleet scale:

Signal	Healthy	Under review	Sending d
Bounce rate	Under 2%	5%	10%
Complaint rate	Under 0.1%	0.1%	0.5%

Only hard bounces count — full mailboxes and greylisting don't — and the denominator is a recent representative send volume, not a fixed time window, so the rate stays meaningful at any scale. But a d account doesn't resume on a timer: clearing a requires contacting support with the cause and the fix. Multiply that by a fleet and the lesson is obvious: you want your own circuit breakers tripping before the platform's do.

"Under review" is silent to your application — sending continues. A is not: outbound send requests start failing with a 400 Bad Request

carrying text from the underlying infrastructure about the account being suspended or d. Fleet send paths should recognize that shape, alongside the more mundane 400

for "domain is not verified"

(a provisioning step got skipped) and 429

for "rate limit exceeded"

.

On quotas: the free plan allows 200 messages per account per day, paid plans have no daily cap by default, and a policy can set a stricter per-account quota. At fleet scale that policy quota doubles as a cheap circuit breaker — a runaway agent hits its own ceiling long before it can damage the domain's reputation.

The docs hand you the telemetry for exactly that. Four webhook triggers — message.transactional.delivered

, .bounced

, .complaint

, and .rejected

— carry the same events the rates are computed from. Wire them into your own per-account counters and your outbound logic when an account trends toward 5% bounces. You'll see the problem in your own metrics before enforcement does.

The other enforcement path is the abuse restriction: a 403

with send blocked by abuse restriction

, applied by the Nylas operations team rather than by a threshold. Restrictions can scope to a single sender address, a sender domain (including its subdomains), an organization, an application, or one specific grant — the most specific match applies. The application-level case is the one that matters for fleets: it stops every Agent Account under the application, not just the misbehaving one. Recovery means contacting support with the application ID, the grant ID, and one example error response; once the restriction is lifted, sends succeed on the next attempt with no propagation delay. Fleet code should treat 429

and 403

as first-class states, not exceptions to log and forget.

The boring hygiene matters too, because the complaint threshold is tiny — at low volume, a handful of recipients clicking "mark as spam" can put an account under review at 0.1%. Validate recipient addresses before sending, skip anything that has hard-bounced before, honor unsubscribes immediately, use double opt-in for lists you care about, and get DKIM, SPF, and DMARC right on every domain — misconfigured authentication shows up as extra hard bounces from servers that refuse the mail outright.

The receiving side scales more gracefully than you'd expect, because webhook subscriptions are application-level, not per-grant. One message.created

subscription covers every mailbox in the fleet; each payload carries the grant_id

, so your handler routes by grant. There's no per-mailbox registration step in the provisioning loop and no subscription cleanup in the teardown path.

That makes the consumer architecture the standard one for any high-volume webhook source: receive, enqueue keyed by grant_id

, process from the queue. Nothing exotic — the point is that the per-account overhead on the inbound path is zero.

Condensed from the three docs above:

workspace_id

in every provisioning call; treat auto_group

as a backstop.message.transactional.*

triggers and build per-account bounce/complaint counters from day one.429

and 403

as expected states in your send path.If you're sketching a fleet right now, start with the workspace layout — it's the one decision that's annoying to retrofit. What's your target mailbox count, and which threshold worries you more: the 0.5% complaint or the application-wide abuse block?

source & further reading

dev.to — original article Why NVIDIA Open-Sourced Its Linux GPU Kernel Modules Audit, Observability & Lineage for Enterprise AI Agents RufRoot (CVE-2026-59726): Full Compromise of AI Agent Infrastructure via Unauthenticated MCP Bridge

Scaling to Thousands of Agent Mailboxes

Run your AI side-project on zahid.host