# The 5 pieces of AI plumbing every SaaS needs in 2026 (with code)

> Source: <https://dev.to/mt211211/the-5-pieces-of-ai-plumbing-every-saas-needs-in-2026-with-code-1270>
> Published: 2026-06-12 20:49:51+00:00

Every SaaS is adding AI features in 2026. Most teams burn the first two weeks on the same five pieces of plumbing — none of which are the actual product. Here's each one, with working TypeScript for Next.js 15.

Users won't stare at a spinner for 20 seconds. Stream tokens as they generate with server-sent events:

``` js
// app/api/chat/route.ts
const runner = anthropic.beta.messages.toolRunner({
  model: "claude-opus-4-8",
  max_tokens: 64000,
  thinking: { type: "adaptive" },
  system: SYSTEM_PROMPT,
  tools,
  messages,
  stream: true,
});

const stream = new ReadableStream({
  async start(controller) {
    for await (const messageStream of runner) {
      for await (const event of messageStream) {
        if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
          controller.enqueue(encode(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`));
        }
      }
    }
    controller.close();
  },
});
return new Response(stream, { headers: { "Content-Type": "text/event-stream" } });
```

The difference between a chatbot and a product is tools — the model acting on *your* data. Define them once with Zod; the SDK's tool runner handles the execution loop:

``` js
export const searchOrders = betaZodTool({
  name: "search_orders",
  description: "Look up a customer's orders. Call when the user asks about order status.",
  inputSchema: z.object({ email: z.string().email() }),
  run: async ({ email }) => db.orders.findByEmail(email),
});
```

No manual agentic loop, no JSON schema by hand, inputs typed end to end.

One enthusiastic user on your £10/month plan can generate £200 of API costs. Meter every request and weight output tokens (they cost ~5x input):

```
export function billableUnits(u: Usage): number {
  return u.input_tokens + (u.cache_read_input_tokens ?? 0) / 10 + u.output_tokens * 5;
}
// After each response:
await recordUsage(userId, billableUnits(message.usage));
// Before each request:
if (await getUsage(userId) > planLimit) return quotaExceeded();
```

Prompt caching can cut input costs ~90% — but it's a *prefix match*. One interpolated timestamp in your system prompt and you pay full price on every request. Rules:

``` js
export const SYSTEM_PROMPT = [{
  type: "text" as const,
  text: STABLE_INSTRUCTIONS,          // never interpolate into this
  cache_control: { type: "ephemeral" as const },
}];
```

Verify it works: `usage.cache_read_input_tokens`

should be non-zero from the second request on.

Parse the SSE buffer across chunk boundaries — the naive `split`

on every chunk drops tokens:

``` js
let buffer = "";
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n\n");
  buffer = lines.pop() ?? "";   // keep the partial event for the next chunk
  for (const line of lines) handleEvent(line);
}
```

All five pieces above are open source (MIT) in [agentship-lite](https://github.com/mt211211/agentship-lite) — copy them into any Next.js app.

If you want the full SaaS around it — Stripe subscriptions, auth, Postgres schema, plan gating wired to the metering — that's [AgentShip](https://mt211211.github.io/agentship-site/), currently £49 early access.
