{"slug": "you-have-a-free-ai-model-sitting-in-chrome-right-now", "title": "You Have a Free AI Model Sitting in Chrome Right Now", "summary": "Chrome 138 ships with a free, on-device AI model called Gemini Nano accessible through the new Prompt API, requiring no API keys or cloud round-trips. Developer Maneshwar built a single-file HTML playground that demonstrates the full API surface, including session management, streaming, structured output, and multimodal input. The playground is available on GitHub and runs directly in Chrome without any build steps or servers.", "body_md": "*Hello, I'm Maneshwar. I'm building git-lrc, a Micro AI code reviewer that runs on every commit. It is free and source-available on Github. Star git-lrc to help devs discover the project. Do give it a try and share your feedback for improving the project.*\n\nYou might not have noticed, but Chrome quietly started shipping a local AI model called **Gemini Nano** bundled right into the browser.\n\nNo API keys. No cloud round-trips. No per-token cost. It just runs on your machine.\n\nThe interface to talk to it is called the **Prompt API**, and it landed in Chrome 138.\n\nI spent some time going through the full API surface and built a [playground](https://github.com/lovestaco/gemini-brow) that lets you experiment with every feature session management, streaming, structured output, multimodal input, response prefixing, and more in one page.\n\nThis post walks you through all of it.\n\nOn-device AI flips the usual tradeoffs:\n\nThe catch is that Gemini Nano is a small model.\n\nIt's great for classification, summarization, Q&A on focused content, and structured extraction.\n\nIt won't replace GPT-4 for complex reasoning.\n\nThink of it as a smart, free, always-available layer you can add on top of your existing product.\n\nThe Prompt API isn't on by default in all Chrome builds. Enable two flags:\n\n**Step 1** — Go to `chrome://flags/#optimization-guide-on-device-model`\n\nand set it to **Enabled BypassPerfRequirement**.\n\n**Step 2** — Go to `chrome://flags/#prompt-api-for-gemini-nano`\n\nand enable both the base API and the multimodal option.\n\nRelaunch Chrome.\n\nThen visit `chrome://on-device-internals`\n\nto check the model download status. First use will trigger a download — Gemini Nano is a few gigabytes.\n\nI put together a single-file HTML playground that covers the entire API surface.\n\nClone it and open `playground.html`\n\ndirectly in Chrome no build step, no server.\n\n```\ngit clone https://github.com/lovestaco/gemini-brow\n```\n\nThen open `playground.html`\n\nin Chrome 138+.\n\nEverything starts with a **session**.\n\nYou create one with `LanguageModel.create()`\n\n, optionally passing a system prompt and expected input/output modalities.\n\n``` js\nconst session = await LanguageModel.create({\n  initialPrompts: [\n    { role: 'system', content: 'You are a helpful and friendly assistant.' }\n  ],\n  expectedInputs:  [{ type: 'text', languages: ['en'] }],\n  expectedOutputs: [{ type: 'text', languages: ['en'] }],\n});\n```\n\nAlways call `LanguageModel.availability()`\n\nwith the **same options** you'll pass to `create()`\n\nbefore creating a session the model may not support certain modalities on every device.\n\n``` js\nconst avail = await LanguageModel.availability({\n  expectedInputs:  [{ type: 'text', languages: ['en'] }],\n  expectedOutputs: [{ type: 'text', languages: ['en'] }],\n});\n// 'available' | 'downloadable' | 'downloading' | 'unavailable'\n```\n\nEach session has a **context window** a token budget that tracks everything in the conversation.\n\nWhen it fills up, the oldest prompt/response pairs are dropped (but the system prompt is never dropped).\n\nYou can monitor it:\n\n```\nconsole.log(`${session.contextUsage} / ${session.contextWindow} tokens used`);\n\nsession.addEventListener('contextoverflow', () => {\n  console.log('Oldest turns are being dropped to make room');\n});\n```\n\nThe playground shows a live progress bar for context usage, and a warning badge on overflow.\n\nYou can also **clone** a session to fork the conversation at a point in time — the clone is fully independent and won't see future messages sent to the original.\n\n``` js\nconst forkedSession = await session.clone();\n```\n\nAnd **destroy** it when you're done to free resources:\n\n```\nsession.destroy();\n```\n\nUse `prompt()`\n\nwhen you want the complete output before rendering:\n\n``` js\nconst result = await session.prompt('Write me a short haiku about coffee.');\nconsole.log(result);\n```\n\nPass an `AbortController`\n\nsignal to add a stop button:\n\n``` js\nconst controller = new AbortController();\nstopBtn.onclick = () => controller.abort();\n\nconst result = await session.prompt('Write me a poem!', {\n  signal: controller.signal,\n});\n```\n\nUse `promptStreaming()`\n\nfor longer responses.\n\nIt returns a `ReadableStream`\n\nwhere each chunk is a **delta** the new tokens only.\n\nAccumulate them yourself:\n\n``` js\nconst stream = session.promptStreaming('Explain how a browser renders a web page.');\n\nlet fullText = '';\nfor await (const chunk of stream) {\n  fullText += chunk;\n  outputEl.textContent = fullText;\n}\n```\n\nThis is the right pattern, don't replace the display with each raw chunk or you'll get flickering (each chunk is only a word or two).\n\n`session.append()`\n\nlets you pre-load context into the session without triggering a response. This is useful when you want the model to process heavy inputs (like images) while the user is still typing their question.\n\n```\n// Pre-load context\nawait session.append([{\n  role: 'user',\n  content: 'Here is the document you will answer questions about: ...'\n}]);\n\n// Later, ask the question\nconst answer = await session.prompt('What are the key takeaways?');\n```\n\nThe promise from `append()`\n\nresolves once the input has been processed and is ready in the session's context.\n\nIf your device has a GPU with more than 4 GB VRAM, the model can process images.\n\n**Image input needs its own session** you must declare `{ type: 'image' }`\n\nin `expectedInputs`\n\nat creation time, and check availability separately since not all devices support it.\n\n``` js\nconst imageAvail = await LanguageModel.availability({\n  expectedInputs:  [{ type: 'text', languages: ['en'] }, { type: 'image' }],\n  expectedOutputs: [{ type: 'text', languages: ['en'] }],\n});\n\nif (imageAvail === 'unavailable') {\n  // GPU requirement not met\n  return;\n}\n\nconst imageSession = await LanguageModel.create({\n  expectedInputs:  [{ type: 'text', languages: ['en'] }, { type: 'image' }],\n  expectedOutputs: [{ type: 'text', languages: ['en'] }],\n});\n\nconst imageBlob = await fetch('photo.jpg').then(r => r.blob());\n\nconst result = await imageSession.prompt([{\n  role: 'user',\n  content: [\n    { type: 'text',  value: 'What is in this image?' },\n    { type: 'image', value: imageBlob },\n  ],\n}]);\n```\n\nThe API accepts `Blob`\n\n, `HTMLImageElement`\n\n, `HTMLCanvasElement`\n\n, `ImageBitmap`\n\n, `ImageData`\n\n, and more. In the playground, you can upload any image file and ask the model about it.\n\nOne of my favourite features.\n\nYou can prefill the **start** of the assistant's response by passing an assistant-role message with `prefix: true`\n\n.\n\nThe model is forced to continue from that prefix.\n\nThis is a clean way to lock in an output format without relying on instruction-following:\n\n``` js\n// Force the model to start its response with ```\n{% endraw %}\ntoml\nconst result = await session.prompt([\n  {\n    role: 'user',\n    content: 'Create a character sheet for a gnome barbarian.',\n  },\n  {\n    role: 'assistant',\n    content: '\n{% raw %}\n``` toml\\n',\n    prefix: true,\n  },\n]);\n// result continues from the prefix: ```\n{% endraw %}\ntoml\\n[character]\\nname = \"...\n{% raw %}\n```\n\nIn the playground, you can set any prefix string and watch the model continue from exactly that point.\n\nNeed a fast yes/no answer? Pass `{ type: 'boolean' }`\n\nas the `responseConstraint`\n\nand you'll always get back a raw `true`\n\nor `false`\n\n— no parsing, no prompt engineering around output format:\n\n``` js\njs\nconst raw = await session.prompt(\n  `Is this post about pottery?\\n\\n\"${text}\"`,\n  { responseConstraint: { type: 'boolean' } }\n);\n\nconst result = JSON.parse(raw); // true or false\n```\n\nThis is great for content moderation, topic detection, or gating features based on page content.\n\nThe full power of `responseConstraint`\n\nis a complete **JSON Schema**.\n\nThe model is constrained to produce valid JSON that matches your schema no hallucinated keys, no wrong types.\n\n``` js\njs\nconst schema = {\n  type: 'object',\n  properties: {\n    sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },\n    score:     { type: 'number', minimum: 0, maximum: 10 },\n    summary:   { type: 'string' },\n  },\n  required: ['sentiment', 'score', 'summary'],\n};\n\nconst raw = await session.prompt(\n  `Analyze the sentiment of this review:\\n\\n\"${reviewText}\"`,\n  { responseConstraint: schema }\n);\n\nconst data = JSON.parse(raw);\n// { sentiment: 'positive', score: 8.5, summary: '...' }\n```\n\nNote: the schema itself uses some tokens from your context window. You can measure how many with\n\n`session.measureContextUsage({ responseConstraint: schema })`\n\n.\n\nHere's how these features combine in a real use case.\n\nSay you're building a Chrome Extension that summarises product reviews on any e-commerce page:\n\n``` js\njs\n// 1. Check availability\nconst avail = await LanguageModel.availability({\n  expectedInputs:  [{ type: 'text', languages: ['en'] }],\n  expectedOutputs: [{ type: 'text', languages: ['en'] }],\n});\nif (avail === 'unavailable') return;\n\n// 2. Create a session with context\nconst session = await LanguageModel.create({\n  initialPrompts: [{\n    role: 'system',\n    content: 'You analyse product reviews and extract structured insights.',\n  }],\n  expectedInputs:  [{ type: 'text', languages: ['en'] }],\n  expectedOutputs: [{ type: 'text', languages: ['en'] }],\n});\n\n// 3. Pre-load the reviews while user looks at the page\nawait session.append([{\n  role: 'user',\n  content: `Here are the reviews:\\n\\n${scrapedReviews}`,\n}]);\n\n// 4. Get structured output\nconst schema = {\n  type: 'object',\n  properties: {\n    verdict:  { type: 'string', enum: ['buy', 'skip', 'depends'] },\n    pros:     { type: 'array', items: { type: 'string' } },\n    cons:     { type: 'array', items: { type: 'string' } },\n    summary:  { type: 'string' },\n  },\n  required: ['verdict', 'pros', 'cons', 'summary'],\n};\n\nconst result = JSON.parse(\n  await session.prompt('Summarise these reviews.', { responseConstraint: schema })\n);\n```\n\nZero API cost.\n\nRuns entirely on the user's machine.\n\nWorks offline after first load.\n\nThe full playground is one HTML file no dependencies, no build step:\n\n[github.com/lovestaco/gemini-brow](https://github.com/lovestaco/gemini-brow)\n\nClone it, open `playground.html`\n\nin Chrome 138+, enable the flags above, and every feature in this post is wired up and ready to experiment with.\n\nThe Prompt API is still evolving, language support is limited (`en`\n\n, `ja`\n\n, `es`\n\nfor now), mobile isn't supported yet, and the model is small.\n\nBut the fundamentals are solid and the use cases where it shines classification, summarization, extraction, Q&A on focused content are genuinely useful without touching your server budget.\n\nAI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.\n\ngit-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*\n\nAny feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.\n\n⭐ Star it on GitHub:\n\n| [🇩🇰 Dansk](https://github.com/HexmosTech/git-lrc/readme/README.da.md) | [🇪🇸 Español](https://github.com/HexmosTech/git-lrc/readme/README.es.md) | [🇮🇷 Farsi](https://github.com/HexmosTech/git-lrc/readme/README.fa.md) | [🇫🇮 Suomi](https://github.com/HexmosTech/git-lrc/readme/README.fi.md) | [🇯🇵 日本語](https://github.com/HexmosTech/git-lrc/readme/README.ja.md) | [🇳🇴 Norsk](https://github.com/HexmosTech/git-lrc/readme/README.nn.md) | [🇵🇹 Português](https://github.com/HexmosTech/git-lrc/readme/README.pt.md) | [🇷🇺 Русский](https://github.com/HexmosTech/git-lrc/readme/README.ru.md) | [🇦🇱 Shqip](https://github.com/HexmosTech/git-lrc/readme/README.sq.md) | [🇨🇳 中文](https://github.com/HexmosTech/git-lrc/readme/README.zh.md) |\n\nAI agents write code fast. They also *silently remove logic*, change behavior, and introduce bugs -- without telling you. You often find out in production.\n\n** git-lrc fixes this.** It hooks into\n\n`git commit`\n\nand reviews every diff git-lrc-intro-60s.mp4See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements", "url": "https://wpnews.pro/news/you-have-a-free-ai-model-sitting-in-chrome-right-now", "canonical_source": "https://dev.to/lovestaco/you-have-a-free-ai-model-sitting-in-chrome-right-now-43c3", "published_at": "2026-05-30 18:47:06+00:00", "updated_at": "2026-05-30 19:11:56.019884+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "ai-tools", "ai-products"], "entities": ["Chrome", "Gemini Nano", "Prompt API", "Maneshwar", "git-lrc", "GPT-4", "Github", "Google"], "alternates": {"html": "https://wpnews.pro/news/you-have-a-free-ai-model-sitting-in-chrome-right-now", "markdown": "https://wpnews.pro/news/you-have-a-free-ai-model-sitting-in-chrome-right-now.md", "text": "https://wpnews.pro/news/you-have-a-free-ai-model-sitting-in-chrome-right-now.txt", "jsonld": "https://wpnews.pro/news/you-have-a-free-ai-model-sitting-in-chrome-right-now.jsonld"}}