Grammarly costs $12/mo — a local LLM does it for free (Chrome + Ollama)

wpnews.pro

I write a lot in the browser — email, GitHub comments, contact forms — and I wanted proofreading without up every keystroke to a company's cloud. My workplace bans Grammarly for exactly that reason.

So I built inline-scribe: a Chrome extension that proofreads your text with an AI that runs on your own machine (Ollama). Nothing leaves your computer. And the fixes show up like Word's Track Changes — accept or reject each one individually with ✓ / ✕.

This post is about the two design decisions I found most interesting while building it. Both generalize to anyone wiring a local LLM into a product.

declarativeNetRequest

OLLAMA_ORIGINS

.If you write in a browser today, you pick one of three bad options:

The thing is, the AI isn't the hard part anymore. Anyone can run a capable model locally with Ollama in two commands, for free. What's missing is the interface. The reason Grammarly was worth paying for was never the grammar engine — it was the friendly diff that lets you see and control each change.

That interface, on top of a model you own, is the whole product.

corrections	your text goes to	inline diff, per-fix accept/reject	price
Grammarly
cloud AI	their servers	✅ (the reason people pay)	$12+/mo
Harper (10k★)
local, rule-based	nowhere ✅	❌ underlines typos only — can't rewrite a clumsy sentence	free
scramble / Typollama
local LLM ✅	nowhere ✅	❌ whole-text replacement or popup	free
inline-scribe
local LLM ✅	nowhere ✅	✅	free

This is the one I most want to share.

The intuitive move is to ask the model for structured output: "return the changes as JSON," something like [{ "delete": "...", "insert": "..." }, ...]

, and pipe it straight into the UI.

But small local models break when you do this. A model like llama3.2 (3B) is surprisingly good at fixing prose and terrible at structured output: it breaks the JSON, adds explanations, wraps everything in a code fence, renames your keys. A chatty 3B model means a broken UI.

So I split the responsibilities:

(original, corrected)

with a

you press Alt+G in a text field
   │
   ▼
the extension sends your text to YOUR endpoint     ← default: Ollama on 127.0.0.1
(an OpenAI-compatible /chat/completions API)          model: llama3.2 (~2GB, free)
   │
   ▼
the model returns corrected prose — just text
   │
   ▼
inline-scribe computes a word-level diff           ← deterministic algorithm,
between your text and the correction                  NOT the LLM's opinion
   │
   ▼
review panel: accept ✓ / reject ✕ each change → Apply writes back only what you approved

The diff tokenizes into words + whitespace + punctuation runs, then does an LCS (longest common subsequence) walk:

// Tokenize into words/whitespace/punctuation, preserving everything
export function tokenize(text: string): string[] {
  return text.match(/\s+|[^\s\w]+|\w+/gu) ?? [];
}

export function diffText(original: string, corrected: string): Hunk[] {
  const a = tokenize(original);
  const b = tokenize(corrected);
  // DP table of LCS lengths over a × b (Uint32Array rows)
  // Walk the table emitting equal / delete / insert, merging adjacent ops.
  // Collapse delete+insert neighbours into one `replace` so a phrase rewrite
  // reads as a single reviewable hunk instead of three.
  ...
}

This split has a lot of happy side effects:

{ kind, original, corrected }

. Accepted hunks take corrected

, rejected take original

, concatenate — done.

export function applyDecisions(hunks: Hunk[], accepted: boolean[]): string {
  let result = '';
  hunks.forEach((h, idx) => {
    if (h.kind === 'equal') result += h.original;
    else result += accepted[idx] ? h.corrected : h.original;
  });
  return result;
}

Even with "return only text," small models still wrap output in fences or quotes. I gave up on prompting that away and instead strip the obvious wrappers in post — without touching real content:

export function stripWrapping(reply: string, original: string): string {
  let out = reply.replace(/\r\n/g, '\n');
  const fence = out.match(/^\s*```[a-z]*\n([\s\S]*?)\n```\s*$/);
  if (fence) out = fence[1];                       // strip ```...```
  out = out.replace(/^\s+|\s+$/g, '');
  if (/^".*"$/s.test(out) && !/^"/.test(original.trim())) out = out.slice(1, -1); // strip whole-reply quotes
  if (original.endsWith('\n') && !out.endsWith('\n')) out += '\n'; // preserve trailing-newline convention
  return out;
}

Takeaway: let small local models do what they're good at (return natural language) and keep the structure — diffs, JSON, state — in deterministic code. This isn't specific to proofreading; it's a general principle for putting a local LLM into a product.

This is the pothole every Chrome-extension × Ollama project hits.

Stock Ollama rejects requests carrying a chrome-extension://...

Origin with a 403 — a guard against cross-origin access from extensions. The official workaround is to have the user set the OLLAMA_ORIGINS

env var. But asking for that means:

OLLAMA_ORIGINS=chrome-extension://<id-that-changes>

In other words, the moment your README documents an env var, you've lost. It should just work with a stock ollama serve

.

The fix: use MV3's declarativeNetRequest (DNR) to strip the Origin header from requests to the configured endpoint with a dynamic rule. No Origin, no 403.

async function syncOriginRule(): Promise<void> {
  const stored = await chrome.storage.sync.get('config');
  const endpoint = stored.config?.endpoint ?? DEFAULT_CONFIG.endpoint;
  const host = new URL(endpoint).host; // e.g. 127.0.0.1:11434

  await chrome.declarativeNetRequest.updateDynamicRules({
    removeRuleIds: [1],
    addRules: [{
      id: 1,
      priority: 1,
      condition: { urlFilter: `||${host}/`, resourceTypes: ['xmlhttprequest'] },
      action: {
        type: 'modifyHeaders',
        requestHeaders: [{ header: 'origin', operation: 'remove' }],
      },
    }],
  });
}

Key points:

urlFilter

). It's not a dangerous "strip Origin everywhere" rule.chrome.storage.onChanged

and declarativeNetRequest

plus localhost host_permissions

.

// manifest.json (excerpt)
"permissions": ["storage", "activeTab", "declarativeNetRequest", "contextMenus"],
"host_permissions": ["http://127.0.0.1/*", "http://localhost/*"],

The result: the user's steps are "install Ollama, ollama serve

, install the extension." Zero env vars.

One more thing. The actual fetch

(the request to 127.0.0.1) happens in the service worker, not the content script:

// content → background message; background runs the check and replies
chrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => {
  if (msg?.type !== 'inline-scribe:check') return undefined;
  (async () => {
    const config = { ...DEFAULT_CONFIG, ...(await chrome.storage.sync.get('config')).config };
    try {
      const corrected = await new OllamaChecker(config).check(msg.text);
      sendResponse({ ok: true, corrected, model: config.model });
    } catch (err) {
      sendResponse({ ok: false, error: /* CheckerError message */ });
    }
  })();
  return true; // keep the channel open for the async response
});

Two reasons:

fetch

from a content script gets blocked when the page's Content-Security-Policy restricts connect-src

. The service worker runs in the extension's context and is unaffected.xmlhttprequest

, so the rule above applies cleanly.The UI (review panel, the ✎ selection icon, in-place replacement) is rendered in a shadow DOM from the content script so it doesn't collide with the page's CSS.

inline-scribe is, at its core, "Grammarly's diff UX on top of a local LLM." The design decisions that made it work:

OLLAMA_ORIGINS

, zero config.Swap the system prompt and the same diff UI becomes a translator or a tone-shifter. Source is MIT.

If you're putting a local LLM into a product, the leverage is in deciding what the model does — and what it doesn't.

source & further reading

dev.to — original article Why AI Agents Lose Their Memory And How MemoFS Solves It Sign the message, not the tunnel: Introducing N-AALP for AI agents Google Photos Video Remix Brings Gemini Omni Video Styles to Eligible Subscribers

Grammarly costs $12/mo — a local LLM does it for free (Chrome + Ollama)

Run your AI side-project on zahid.host