Supercharge your web app with free AI that runs in your users' browser

wpnews.pro

There is a class of feature that used to be impossible to ship for free: anything that needed a language model. You wired up an API key, you ate the per-token bill, and every prompt your users typed went off to someone else's server. For a small public tool, that math usually killed the idea before it started.

That changed. Recent versions of Chrome ship a language model, Gemini Nano, and expose it to any web page through the Prompt API. The model runs on the user's own machine. No API key. No inference bill. No data leaving the browser.

We put this into a real, live tool, a free Mermaid diagram editor where you describe a diagram in plain English and the browser writes the Mermaid code for you. This post is the developer's version of that story: how the API actually works, the code that makes a small on-device model trustworthy, and an honest accounting of what you gain and what you give up.

The important word is built-in. This is not WebGPU plus a 4 GB model you download and run yourself. The model ships with Chrome, and you talk to it through a small standard-track JavaScript API.

As of Chrome 148, the Prompt API is stable for web pages (it had been available to extensions since Chrome 138). It is the general-purpose member of a growing family of built-in APIs:

LanguageModel

): general natural-language prompting, now multimodal (text, plus image and audio input).The Prompt API is the one you reach for when you need something the task APIs don't cover, like "turn this description into Mermaid source." So that is the one this post focuses on.

Here is the whole happy path. Check availability, create a session, prompt it.

// Feature-detect first. Old browsers won't have this at all.
if ('LanguageModel' in self) {
  const status = await LanguageModel.availability();

  if (status !== 'unavailable') {
    const session = await LanguageModel.create();
    const answer = await session.prompt('Explain event loops in one sentence.');
    console.log(answer);
    session.destroy();
  }
}

That is it. No keys, no SDK, no network call. The first time an origin uses the model, Chrome downloads it; after that it is local and works offline.

availability()

is the gate you build your UI around. It returns one of four states:

"unavailable"

: the device can't run it (too little disk, no supported hardware, unsupported options)."downloadable"

: supported, but the model needs down first. Requires a user gesture to start."down"

: a download is in progress."available"

: ready right now.Mermaid is a tiny text language: A --> B

becomes a flowchart. It's great once you know it, and forgettable if you only touch it monthly. The obvious fix is to let people describe the diagram and have the model write the Mermaid. The non-obvious part is making a small model's output trustworthy.

Gemini Nano is small. Prompt it for code and it will sometimes wrap the output in markdown fences, add a chatty preamble, or emit a diagram with a subtle syntax error. If you pipe that straight into your renderer, you ship a tool that breaks every fifth try.

The fix is to treat the model as a drafter and put a real validator in front of the user. Mermaid ships its own parser, so we use it as the source of truth:

const clean = (s) => s.replace(/```
{% endraw %}
(?:mermaid)?/g, '').trim();

async function describeToMermaid(description) {
  if ((await LanguageModel.availability()) === 'unavailable') return null;

  const session = await LanguageModel.create({
    initialPrompts: [{
      role: 'system',
      content:
        'You write Mermaid diagram source. Output only valid Mermaid code. ' +
        'No prose, no explanations, no markdown fences.',
    }],
  });

  try {
    let code = clean(await session.prompt({% raw %}`Create a Mermaid diagram: ${description}`{% endraw %}));

    // Source of truth: Mermaid's own parser, not the model's confidence.
    try {
      await mermaid.parse(code);
    } catch (err) {
      // Exactly one self-correction pass. Hand the error back to the model.
      code = clean(await session.prompt(
        {% raw %}`That Mermaid failed to parse:\n${err.message}\n`{% endraw %} +
        {% raw %}`Return corrected Mermaid only.`{% endraw %}
      ));
      await mermaid.parse(code); // still broken? this throws, caller handles it
    }

    return code;
  } finally {
    session.destroy(); // free the model; sessions are not free to hold open
  }
}
{% raw %}

That validate-and-retry loop is the difference between a demo and a tool. The model gets one chance to fix its own mistake. If it fails twice, we show a friendly message and leave the editor untouched rather than rendering garbage. The parser is the authority; the model is just a fast first draft.

For outputs that are structured, you don't have to hope. The Prompt API accepts a JSON Schema via responseConstraint

, and the model is forced to match it:

js
const schema = { type: 'boolean' };

const result = await session.prompt(
  `Is this text describing a sequence of steps?\n\n${input}`,
  { responseConstraint: schema }
);
console.log(JSON.parse(result)); // true | false

Mermaid source isn't cleanly expressible as JSON Schema, which is exactly why we lean on the parser instead. But for classification, extraction, or form-filling, structured output removes a whole category of cleanup code.

This is the part most people get wrong. On-device AI is a bonus for capable machines, not a baseline you can assume. So gate the feature, never the app.

In our editor, the entire tool, the live preview, themes, export, sharing, works in any modern browser. The Generate-from-text box only appears when the model reports itself usable. Everyone else sees a normal editor and never knows a feature was missing.

js
async function setupAI(generateButton) {
  if (!('LanguageModel' in self)) return; // not Chrome, or too old

  const status = await LanguageModel.availability();
  if (status === 'unavailable') return;

  generateButton.hidden = false;

  generateButton.onclick = async () => {
    // Model download needs a user gesture; this click is it.
    const session = await LanguageModel.create({
      monitor(m) {
        m.addEventListener('downloadprogress', (e) => {
          showProgress(Math.round(e.loaded * 100)); // multi-GB first time
        });
      },
    });
    // ...use the session...
  };
}

Two details that bite people:

create()

must run inside a real user gesture (a click, key press, tap). Calling it on page load throws. Check navigator.userActivation.isActive

if you're unsure.downloadprogress

and tell the user, or your "Generate" button looks frozen for minutes.This is genuinely a new capability, and for the right feature it's hard to beat:

Equally honest, because this is where the "just use it everywhere" dream dies:

QuotaExceededError

and the contextoverflow

event), and as of Chrome 149 the language model targets English, Spanish, Japanese, German, and French.temperature

/topK

are extension-only for now, and it isn't available in Web Workers yet.The decision is mostly about whether the feature is essential or a bonus, and how sensitive the data is.

Reach for on-device AI when the feature can be progressive enhancement, when privacy is a real selling point, when the workload is small and frequent (classify, extract, rewrite, draft), and when you'd rather not run a backend at all. That describes a surprising amount of "nice to have" AI.

Stay server-side when the feature is core to every user, when you need a large or frontier model, when output quality must be consistent across all hardware, or when you need it on mobile and Safari today. And you don't have to choose forever: a common pattern is hybrid, run on-device when available and fall back to a cloud model otherwise. Chrome's docs cover a polyfill and a Firebase AI Logic fallback for exactly this.

For our Mermaid editor the choice was easy. The diagram generator is a bonus, the people who can run it get something delightful and private, and everyone else gets a fully working editor. Nobody hits a wall.

One detail that cost us an afternoon and might save you one. Exporting the diagram to PNG meant drawing it onto a hidden canvas, and in Chrome it kept failing with Tainted canvases may not be exported.

The cause: Mermaid was rendering text labels inside an embedded HTML element (a foreignObject

), and the browser treats that as a security taint on the canvas, which blocks export. The fix was to configure Mermaid to render labels as real SVG <text>

instead of embedded HTML. Bonus: the text now survives PNG export cleanly and stays selectable in the SVG. If you ever see a tainted-canvas error on an export that looked entirely local, check for foreignObject

first.

The Mermaid editor is live and free. If you're on a recent desktop Chrome, describe a diagram and watch the browser write it, with nothing leaving your machine. If you're not, you still get a fast editor with live preview, themes, and export.

The broader point: a meaningful slice of the AI features you've been quoting backend costs for can now run for free in the client, with better privacy than your server ever offered. It won't fit every case, the hardware bar and browser support see to that, but when it fits, it fits beautifully.

This is the pragmatic view of AI we bring to client work, too. We're far more interested in AI that quietly does a real job than AI as a headline, and in custom software built around how you actually work. If you've got a workflow that needs its own small, sharp tool, we like that kind of problem.

Built by Bitvea. You handle business. We handle IT.

source & further reading

dev.to — original article I spent two weeks optimizing 96GB of VRAM for local LLMs. Paid APIs still won. From the factory floor to AI developer: tools that run in my own plant Day 9 of building an AI agent that controls a phone.

Supercharge your web app with free AI that runs in your users' browser

Run your AI side-project on zahid.host