Why I Move AI Model Calls to the Server — Security, Performance, and Everything In Between A developer building Logicvisor—an AI tool that reviews algorithmic code and provides complexity analysis—chose to route all AI model calls through a server rather than making them directly from the client. The decision was driven by security concerns, as client-side calls expose API keys and proprietary prompts to anyone with browser DevTools access, risking unauthorized usage and billing abuse. Server-side architecture also prevents users from flooding the AI endpoint with automated requests, protecting both performance and costs. When I was building Logicvisor https://logicvisor.vercel.app/ — an AI-powered tool that reviews your algorithmic code, breaks down time and space complexity, and gives you the kind of feedback you'd want before a technical interview — I had to make a foundational architectural decision early on. Where do the AI calls actually live? It sounds simple. It really isn't. And I think it's a decision a lot of developers make too quickly, usually defaulting to whatever gets something running fastest. So I want to walk through how I thought about it, what the tradeoffs actually look like in practice, and why for Logicvisor — and honestly most production projects I work on — the answer was never really up for debate. When your app needs to talk to an AI model — Gemini, Claude, GPT, whatever — that HTTP request to the model provider has to originate somewhere . You have two options: Client-side: The browser makes the call directly to the AI provider's API. Server-side: The browser calls your server, your server calls the AI provider, and the response comes back through your infrastructure. That's the whole decision. But the consequences of each branch run deep. Let's be fair to the other side first, because client-side AI calls aren't just laziness — there are legitimate reasons to reach for them. Zero backend overhead. If you're prototyping, building an MVP, or hacking something together for a weekend project, standing up a server just to proxy AI calls adds friction you might not need yet. The client calls the API, gets a response, done. One less network hop. Client → AI provider is a straight line. Client → your server → AI provider is also a straight line, but a longer one. Every additional hop is a potential source of latency, and if your server is not geographically close to the AI provider, that gap compounds. Fast iteration during development. Tweak a prompt, refresh the page, see the result. No redeployment cycle, no server restart. For the early exploratory phase of building with AI, this feedback loop is genuinely valuable. Fine for purely client-facing tools. If you're building something that doesn't touch your own database, doesn't need user sessions, and doesn't have sensitive business logic — a personal productivity tool, a browser extension, an internal utility — client-side calls can be perfectly appropriate. So that's the honest upside. Now here's where it falls apart. This is the most obvious one, but it's worth being precise about why it's as bad as it is. When you make an API call from the browser, your API key has to be in that request. There's no way around this — the provider needs to authenticate you. And since that request is made from the browser, the key is accessible to anyone who opens DevTools, intercepts traffic, or extracts it from your bundled JavaScript. The consequence isn't just that someone can see your key. It's that they can use it. At your expense. Without your knowledge. AI API billing is usage-based, which means a single bad actor with your key can run up a bill that drains your account before your monitoring even fires an alert — if you have monitoring at all. Key rotation helps, but it's reactive. The damage is usually already done. This one gets less attention but matters more than people realize. The prompts you write are often where your actual product value lives. If you've spent time crafting a system prompt that makes your AI reviewer give structured, consistent, high-quality feedback on algorithmic code — that prompt is the product. Client-side calls expose it completely. A competitor can open DevTools, read your system prompt, and replicate your core feature in an afternoon. On the server, your prompts never leave your infrastructure. The client sends input; the server decides what to do with it. On the client side, there's nothing stopping a user from writing a script that hammers your AI endpoint in a loop. Every one of those requests hits the AI provider and costs you tokens. You have no rate limiting, no request validation, no way to enforce quotas per user. You're not just vulnerable to malicious actors either — a bug in your own frontend code that causes unintended re-fetching can silently burn through your API budget. AI API calls cost money per token. If multiple users ask your tool to review functionally identical code, why would you want to pay for that same inference three hundred times? On the client side, you can't cache at the API level. Every identical request goes to the provider, incurs latency, and costs tokens. On the server, you can cache responses intelligently — hash the input, check your cache layer, return the cached result. You pay once. Without server-side infrastructure, you have no centralized view of how your AI layer is actually being used. Which prompts are performing well? Which inputs are producing garbage responses? Which users are hitting rate limits? Where is your token spend going? Client-side AI calls mean you're guessing at all of this. Logs, monitoring, and observability — the basic instrumentation of a production system — require a server in the loop. With that context established, here's what you actually get when the AI calls live on the server. Your key lives in an environment variable on the server. The client has zero knowledge of it, zero access to it, and zero ability to extract it. This is the minimum acceptable security posture for any application that will see real users. You decide how many requests a given user can make in a given window. You can enforce this per account, per IP, per session — whatever your threat model calls for. Abuse becomes something you manage rather than something that happens to you. export async function enforceAIRateLimit userId: string, request?: NextRequest : Promise