{"slug": "i-built-an-ai-chrome-extension-with-zero-backend-cost-here-s-the-exact", "title": "I built an AI Chrome extension with zero backend cost — here's the exact architecture", "summary": "A developer built three AI-powered Chrome extensions—PR summarization, risk scoring, and draft review generation—with zero backend cost by using a Bring Your Own Key (BYOK) architecture. The extensions call AI providers directly from the browser using the user's own API key, eliminating the need for a server and addressing privacy concerns. The approach supports multiple providers including OpenAI, Groq, Mistral, and local models via Ollama.", "body_md": "You want to add AI to your Chrome extension.\n\nThe obvious path: spin up a Node.js server, hold a master API key, charge users monthly, eat the AI cost. That's what everyone does.\n\nI didn't do that. I built three Chrome extensions with AI features — PR summarization, risk scoring, draft review generation — and my monthly infrastructure bill is $0. No server. No backend. No API key to protect.\n\nHere's the exact architecture, the real trade-offs, and the specific places where this approach breaks down so you don't find out the hard way.\n\nMost AI-powered extensions work like this:\n\n```\nUser → Extension → Your server → AI provider → Your server → Extension → User\n```\n\nYour server holds a master API key. Users pay you. You pay the AI provider out of that margin.\n\n**The problems:**\n\n**You're a proxy business now.** You're paying OpenAI $X, charging users $Y, and the difference is your margin. But you're also responsible for rate limiting, uptime, abuse prevention, and GDPR compliance for every request that touches your server.\n\n**Private code goes through your infra.** For a developer tool that reads GitHub diffs, this is the question users ask first: *\"is my code going to your server?\"* With a hosted backend, the honest answer is yes.\n\n**You're competing on price against companies with VC money.** CodeRabbit, GitHub Copilot, Linear, and a dozen others are running hosted AI with economies of scale you can't match as a solo developer.\n\nThere's a different architecture. It's not new — it's called BYOK (Bring Your Own Key), and it shifts the AI provider relationship from you to the user.\n\n```\nUser → Extension → AI provider (user's own key)\n```\n\nNo server in the middle. No margin math. No *\"is my code safe\"* question.\n\nThe core mechanic is simple: instead of your extension calling your server, it calls the AI provider directly from the browser using the user's own API key.\n\n```\n// The user pastes their API key during onboarding\n// You store it locally — never send it anywhere else\nawait chrome.storage.local.set({ \n  aiApiKey: userProvidedKey,\n  aiProvider: 'groq' // or 'openai', 'mistral', 'ollama'\n});\n\n// Every AI call uses their key, from their browser\nasync function callAI(prompt) {\n  const { aiApiKey, aiProvider } = await chrome.storage.local.get(['aiApiKey', 'aiProvider']);\n\n  const endpoint = getEndpoint(aiProvider);\n\n  const response = await fetch(endpoint, {\n    method: 'POST',\n    headers: {\n      'Authorization': `Bearer ${aiApiKey}`,\n      'Content-Type': 'application/json'\n    },\n    body: JSON.stringify({\n      model: getModel(aiProvider),\n      messages: [{ role: 'user', content: prompt }],\n      max_tokens: 500\n    })\n  });\n\n  return response.json();\n}\n```\n\nThe API key lives in `chrome.storage.local`\n\n. It never leaves the browser except to go directly to the AI provider. Your extension never sees it again after the user pastes it in.\n\nFor direct API calls from a Chrome extension, declare host permissions for each provider you support:\n\n```\n{\n  \"manifest_version\": 3,\n  \"permissions\": [\n    \"storage\"\n  ],\n  \"host_permissions\": [\n    \"https://api.openai.com/*\",\n    \"https://api.groq.com/*\",\n    \"https://api.mistral.ai/*\",\n    \"http://localhost:*/*\"\n  ]\n}\n```\n\nThe `localhost`\n\nentry covers Ollama — for users who want a fully local model with zero API costs.\n\nImportant:In MV3, host permissions are scrutinized during review. Be specific. Don't use`<all_urls>`\n\nwhen you can name the exact domains. I've been through CWS review twice with this manifest — being explicit helps.\n\nAll four major providers use the OpenAI-compatible `/v1/chat/completions`\n\nformat. One implementation, four providers:\n\n``` js\nconst AI_PROVIDERS = {\n  groq: {\n    endpoint: 'https://api.groq.com/openai/v1/chat/completions',\n    model: 'llama-3.3-70b-versatile',\n    maxTokens: 1024,\n    supportsStreaming: true,\n  },\n  openai: {\n    endpoint: 'https://api.openai.com/v1/chat/completions',\n    model: 'gpt-4o-mini',\n    maxTokens: 1024,\n    supportsStreaming: true,\n  },\n  mistral: {\n    endpoint: 'https://api.mistral.ai/v1/chat/completions',\n    model: 'mistral-small-latest',\n    maxTokens: 1024,\n    supportsStreaming: false,\n  },\n  ollama: {\n    endpoint: 'http://localhost:11434/v1/chat/completions',\n    model: 'llama3.2',\n    maxTokens: 1024,\n    supportsStreaming: true,\n  }\n};\n\nasync function getProviderConfig() {\n  const { aiProvider } = await chrome.storage.local.get('aiProvider');\n  return AI_PROVIDERS[aiProvider] || AI_PROVIDERS.groq;\n}\n```\n\nStore the model name here, not hardcoded in your fetch calls. When Groq deprecated an older Llama version, I pushed one config update and every user was on the new model automatically — no user action required.\n\nHere's the real cost of BYOK: **users have to get an API key before they can use your AI features.** Some users bounce at this step.\n\nWhat actually reduces friction:\n\n**1. Lead with Groq.** Groq's free tier covers [~14,400 requests per day](https://console.groq.com/settings/limits) for smaller models. For most individual developers, it's genuinely free. This changes the conversation from *\"go pay for an API key\"* to *\"go get a free API key in 2 minutes.\"*\n\n**2. Give the exact steps, not a vague instruction:**\n\n```\nStep 1: Go to console.groq.com/keys\nStep 2: Click \"Create API key\"\nStep 3: Paste the key here → [input]\n```\n\nThree lines. No ambiguity. I track where users drop off in onboarding — the step with the most abandonment is always the one where I said \"get your API key\" without saying exactly where.\n\n**3. Make core features work without AI.** If every feature is gated behind BYOK setup, the first session is a setup session — and many users don't return for a second. In PR Focus, multi-account GitHub, PR sorting, CSV export, and stale notifications all work without any API key. The AI features are additive.\n\nIf you want to stream AI responses token by token, you hit an MV3 constraint: service workers handle the API calls, but streaming requires a long-lived connection, and service workers can be terminated mid-stream.\n\nThe pattern that works — service worker handles the fetch, sends tokens to the popup via messages:\n\n```\n// Service worker — handles the streaming fetch\nchrome.runtime.onMessage.addListener((message, sender, sendResponse) => {\n  if (message.type === 'STREAM_AI') {\n    streamAIResponse(message.prompt, sender.tab.id);\n    return true; // Keep the message channel open\n  }\n});\n\nasync function streamAIResponse(prompt, tabId) {\n  const config = await getProviderConfig();\n  const { aiApiKey } = await chrome.storage.local.get('aiApiKey');\n\n  const response = await fetch(config.endpoint, {\n    method: 'POST',\n    headers: {\n      'Authorization': `Bearer ${aiApiKey}`,\n      'Content-Type': 'application/json'\n    },\n    body: JSON.stringify({\n      model: config.model,\n      messages: [{ role: 'user', content: prompt }],\n      stream: true\n    })\n  });\n\n  const reader = response.body.getReader();\n  const decoder = new TextDecoder();\n\n  while (true) {\n    const { done, value } = await reader.read();\n    if (done) break;\n\n    const chunk = decoder.decode(value);\n    const lines = chunk.split('\\n').filter(line => line.startsWith('data: '));\n\n    for (const line of lines) {\n      const data = line.slice(6);\n      if (data === '[DONE]') continue;\n\n      try {\n        const parsed = JSON.parse(data);\n        const token = parsed.choices[0]?.delta?.content || '';\n\n        chrome.tabs.sendMessage(tabId, { type: 'AI_TOKEN', token });\n      } catch (e) {\n        // Skip malformed chunks — they happen\n      }\n    }\n  }\n\n  chrome.tabs.sendMessage(tabId, { type: 'AI_DONE' });\n}\n```\n\nThe fetch keeps the service worker alive for the duration of the stream. Tokens go to the popup via messages. The popup accumulates them and renders progressively.\n\nThe most common support category with BYOK: users with wrong or misconfigured keys. Generic \"AI error\" messages generate follow-up tickets. Status-code-specific messages don't:\n\n``` js\nasync function validateApiKey(apiKey, provider) {\n  try {\n    const config = AI_PROVIDERS[provider];\n    const response = await fetch(config.endpoint, {\n      method: 'POST',\n      headers: { \n        'Authorization': `Bearer ${apiKey}`, \n        'Content-Type': 'application/json' \n      },\n      body: JSON.stringify({\n        model: config.model,\n        messages: [{ role: 'user', content: 'test' }],\n        max_tokens: 1\n      })\n    });\n\n    if (response.status === 401) \n      return { valid: false, error: 'Invalid API key — check you copied it completely, no trailing spaces.' };\n    if (response.status === 429) \n      return { valid: false, error: 'Rate limit hit — your key is valid but you\\'ve hit the free tier ceiling.' };\n    if (response.status === 403) \n      return { valid: false, error: 'Permission denied — this key may not have access to this model tier.' };\n    if (!response.ok) \n      return { valid: false, error: `Provider returned ${response.status} — try again in a moment.` };\n\n    return { valid: true };\n  } catch (e) {\n    return { valid: false, error: 'Network error — check your internet connection or try a different provider.' };\n  }\n}\n```\n\nA typical PR summary in PR Focus: ~800 tokens input (diff context + system prompt), ~150 tokens output. ~950 tokens per PR.\n\n| Provider | Tier | Cost per PR | 100 PRs/day |\n|---|---|---|---|\n| Groq (Llama 3.3 70B) | Free | $0 | $0 |\n| OpenAI GPT-4o-mini | Paid | ~$0.0001 | ~$0.01 |\n| Mistral Small | Paid | ~$0.00008 | ~$0.008 |\n| Ollama (local) | Free | $0 | $0 |\n\nThe cost argument for BYOK isn't just privacy — it's math. A hosted model charging $10/month makes pennies after AI costs and infrastructure. Users with their own Groq key pay nothing for individual use. That's a value proposition you can't match with a hosted backend.\n\n**Corporate users behind strict proxies.** Some enterprise environments block direct browser-to-external-API calls. You can't fix this. Be upfront about it, and point to Ollama as the local workaround.\n\n**Ollama requires a separate install.** It's not \"just paste a key\" — it's \"install Ollama, pull a model, run it locally, then configure the extension.\" Worth supporting for privacy-first users, but don't pitch it as the simple path.\n\n**You can't cache responses.** Each user's key means each user pays for their own calls. No cross-user caching. For most use cases this doesn't matter, but if you're building something where 1000 users asking the same question is likely, hosted with caching will be cheaper for them.\n\n**Yes, if:**\n\n**No, if:**\n\n```\nchrome.storage.local\n  ├── aiApiKey      ← user's own, never leaves browser except to provider\n  └── aiProvider    ← 'groq' | 'openai' | 'mistral' | 'ollama'\n\nPopup / content script\n  └── message → service worker: { type: 'RUN_AI', prompt }\n\nService worker\n  ├── reads key + provider from storage\n  ├── calls provider API directly (fetch)\n  └── streams tokens → popup via chrome.runtime.sendMessage\n\nInfrastructure cost: $0\nMonthly AI bill: $0\nTrust question (\"does my code go to your server?\"): No.\n```\n\nEverything in this article is running in ** PR Focus Pro** — a Chrome extension that triages GitHub pull requests with AI summaries, hybrid risk scoring (0–100), and one-click draft reviews. Free to install; AI features activate with your own API key.\n\nThe full engineering decision log behind this architecture — including the options I rejected, what it cost in user friction, and whether I'd choose it again — is [Build Log #007](https://github.com/projekta2/build-logs/blob/main/build-logs/007-byok-chrome-extension-architecture.md) in my public Build Logs repo.\n\nIf you're building something similar and want a second pair of eyes on your implementation, the [Summer Review Swap](https://github.com/projekta2/build-logs/issues/1) is open — there's a PR waiting for a reviewer right now if you want to jump straight in.\n\n*What's your approach to AI in browser extensions? Running your own backend, BYOK, or something else entirely? Particularly curious whether anyone has found a cleaner solution to the streaming + service worker termination problem — drop it in the comments.*\n\n**Links in this article:**", "url": "https://wpnews.pro/news/i-built-an-ai-chrome-extension-with-zero-backend-cost-here-s-the-exact", "canonical_source": "https://dev.to/projekta2/i-built-an-ai-chrome-extension-with-zero-backend-cost-heres-the-exact-architecture-43j7", "published_at": "2026-06-28 08:41:02+00:00", "updated_at": "2026-06-28 09:03:53.472845+00:00", "lang": "en", "topics": ["artificial-intelligence", "developer-tools", "ai-products"], "entities": ["OpenAI", "Groq", "Mistral", "Ollama", "CodeRabbit", "GitHub Copilot", "Chrome Web Store"], "alternates": {"html": "https://wpnews.pro/news/i-built-an-ai-chrome-extension-with-zero-backend-cost-here-s-the-exact", "markdown": "https://wpnews.pro/news/i-built-an-ai-chrome-extension-with-zero-backend-cost-here-s-the-exact.md", "text": "https://wpnews.pro/news/i-built-an-ai-chrome-extension-with-zero-backend-cost-here-s-the-exact.txt", "jsonld": "https://wpnews.pro/news/i-built-an-ai-chrome-extension-with-zero-backend-cost-here-s-the-exact.jsonld"}}