Supercharge your web app with free AI that runs in your users' browser Chrome now ships Gemini Nano, a language model accessible via the Prompt API, enabling on-device AI without API keys or inference costs. A developer integrated it into a free Mermaid diagram editor, using the model as a drafter and Mermaid's parser as a validator to ensure reliable output. There is a class of feature that used to be impossible to ship for free: anything that needed a language model. You wired up an API key, you ate the per-token bill, and every prompt your users typed went off to someone else's server. For a small public tool, that math usually killed the idea before it started. That changed. Recent versions of Chrome ship a language model, Gemini Nano, and expose it to any web page through the Prompt API . The model runs on the user's own machine. No API key. No inference bill. No data leaving the browser. We put this into a real, live tool, a free Mermaid diagram editor https://bitvea.com/en/tools/mermaid-editor where you describe a diagram in plain English and the browser writes the Mermaid code for you. This post is the developer's version of that story: how the API actually works, the code that makes a small on-device model trustworthy, and an honest accounting of what you gain and what you give up. The important word is built-in . This is not WebGPU plus a 4 GB model you download and run yourself. The model ships with Chrome, and you talk to it through a small standard-track JavaScript API. As of Chrome 148, the Prompt API is stable for web pages it had been available to extensions since Chrome 138 . It is the general-purpose member of a growing family of built-in APIs: LanguageModel : general natural-language prompting, now multimodal text, plus image and audio input .The Prompt API is the one you reach for when you need something the task APIs don't cover, like "turn this description into Mermaid source." So that is the one this post focuses on. Here is the whole happy path. Check availability, create a session, prompt it. // Feature-detect first. Old browsers won't have this at all. if 'LanguageModel' in self { const status = await LanguageModel.availability ; if status == 'unavailable' { const session = await LanguageModel.create ; const answer = await session.prompt 'Explain event loops in one sentence.' ; console.log answer ; session.destroy ; } } That is it. No keys, no SDK, no network call. The first time an origin uses the model, Chrome downloads it; after that it is local and works offline. availability is the gate you build your UI around. It returns one of four states: "unavailable" : the device can't run it too little disk, no supported hardware, unsupported options . "downloadable" : supported, but the model needs downloading first. Requires a user gesture to start. "downloading" : a download is in progress. "available" : ready right now.Mermaid is a tiny text language: A -- B becomes a flowchart. It's great once you know it, and forgettable if you only touch it monthly. The obvious fix is to let people describe the diagram and have the model write the Mermaid. The non-obvious part is making a small model's output trustworthy. Gemini Nano is small. Prompt it for code and it will sometimes wrap the output in markdown fences, add a chatty preamble, or emit a diagram with a subtle syntax error. If you pipe that straight into your renderer, you ship a tool that breaks every fifth try. The fix is to treat the model as a drafter and put a real validator in front of the user. Mermaid ships its own parser, so we use it as the source of truth: js const clean = s = s.replace / {% endraw %} ?:mermaid ?/g, '' .trim ; async function describeToMermaid description { if await LanguageModel.availability === 'unavailable' return null; const session = await LanguageModel.create { initialPrompts: { role: 'system', content: 'You write Mermaid diagram source. Output only valid Mermaid code. ' + 'No prose, no explanations, no markdown fences.', } , } ; try { let code = clean await session.prompt {% raw %} Create a Mermaid diagram: ${description} {% endraw %} ; // Source of truth: Mermaid's own parser, not the model's confidence. try { await mermaid.parse code ; } catch err { // Exactly one self-correction pass. Hand the error back to the model. code = clean await session.prompt {% raw %} That Mermaid failed to parse:\n${err.message}\n {% endraw %} + {% raw %} Return corrected Mermaid only. {% endraw %} ; await mermaid.parse code ; // still broken? this throws, caller handles it } return code; } finally { session.destroy ; // free the model; sessions are not free to hold open } } {% raw %} That validate-and-retry loop is the difference between a demo and a tool. The model gets one chance to fix its own mistake. If it fails twice, we show a friendly message and leave the editor untouched rather than rendering garbage. The parser is the authority; the model is just a fast first draft. For outputs that are structured, you don't have to hope. The Prompt API accepts a JSON Schema via responseConstraint , and the model is forced to match it: js js const schema = { type: 'boolean' }; const result = await session.prompt Is this text describing a sequence of steps?\n\n${input} , { responseConstraint: schema } ; console.log JSON.parse result ; // true | false Mermaid source isn't cleanly expressible as JSON Schema, which is exactly why we lean on the parser instead. But for classification, extraction, or form-filling, structured output removes a whole category of cleanup code. This is the part most people get wrong. On-device AI is a bonus for capable machines, not a baseline you can assume. So gate the feature, never the app. In our editor, the entire tool, the live preview, themes, export, sharing, works in any modern browser. The Generate-from-text box only appears when the model reports itself usable. Everyone else sees a normal editor and never knows a feature was missing. js async function setupAI generateButton { if 'LanguageModel' in self return; // not Chrome, or too old const status = await LanguageModel.availability ; if status === 'unavailable' return; generateButton.hidden = false; generateButton.onclick = async = { // Model download needs a user gesture; this click is it. const session = await LanguageModel.create { monitor m { m.addEventListener 'downloadprogress', e = { showProgress Math.round e.loaded 100 ; // multi-GB first time } ; }, } ; // ...use the session... }; } Two details that bite people: create must run inside a real user gesture a click, key press, tap . Calling it on page load throws. Check navigator.userActivation.isActive if you're unsure. downloadprogress and tell the user, or your "Generate" button looks frozen for minutes.This is genuinely a new capability, and for the right feature it's hard to beat: Equally honest, because this is where the "just use it everywhere" dream dies: QuotaExceededError and the contextoverflow event , and as of Chrome 149 the language model targets English, Spanish, Japanese, German, and French. temperature / topK are extension-only for now, and it isn't available in Web Workers yet.The decision is mostly about whether the feature is essential or a bonus, and how sensitive the data is. Reach for on-device AI when the feature can be progressive enhancement, when privacy is a real selling point, when the workload is small and frequent classify, extract, rewrite, draft , and when you'd rather not run a backend at all. That describes a surprising amount of "nice to have" AI. Stay server-side when the feature is core to every user, when you need a large or frontier model, when output quality must be consistent across all hardware, or when you need it on mobile and Safari today. And you don't have to choose forever: a common pattern is hybrid , run on-device when available and fall back to a cloud model otherwise. Chrome's docs cover a polyfill and a Firebase AI Logic fallback for exactly this. For our Mermaid editor the choice was easy. The diagram generator is a bonus, the people who can run it get something delightful and private, and everyone else gets a fully working editor. Nobody hits a wall. One detail that cost us an afternoon and might save you one. Exporting the diagram to PNG meant drawing it onto a hidden canvas, and in Chrome it kept failing with Tainted canvases may not be exported. The cause: Mermaid was rendering text labels inside an embedded HTML element a foreignObject , and the browser treats that as a security taint on the canvas, which blocks export. The fix was to configure Mermaid to render labels as real SVG