Grammarly costs $12/mo — a local LLM does it for free (Chrome + Ollama) A developer built inline-scribe, a Chrome extension that proofreads text using a local LLM via Ollama, ensuring privacy by keeping data on the user's machine. The extension provides an inline diff interface similar to Grammarly's Track Changes, allowing users to accept or reject each correction individually. The project highlights a design split where the LLM returns corrected prose as plain text, and a deterministic algorithm computes the diff, avoiding the unreliability of structured output from small models. I write a lot in the browser — email, GitHub comments, contact forms — and I wanted proofreading without uploading every keystroke to a company's cloud. My workplace bans Grammarly for exactly that reason. So I built inline-scribe https://github.com/mk668a/inline-scribe : a Chrome extension that proofreads your text with an AI that runs on your own machine Ollama . Nothing leaves your computer. And the fixes show up like Word's Track Changes — accept or reject each one individually with ✓ / ✕. This post is about the two design decisions I found most interesting while building it. Both generalize to anyone wiring a local LLM into a product. declarativeNetRequest OLLAMA ORIGINS .If you write in a browser today, you pick one of three bad options: The thing is, the AI isn't the hard part anymore. Anyone can run a capable model locally with Ollama https://ollama.com in two commands, for free. What's missing is the interface . The reason Grammarly was worth paying for was never the grammar engine — it was the friendly diff that lets you see and control each change. That interface, on top of a model you own, is the whole product. | corrections | your text goes to | inline diff, per-fix accept/reject | price | | |---|---|---|---|---| Grammarly | cloud AI | their servers | ✅ the reason people pay | $12+/mo | Harper 10k★ | local, rule-based | nowhere ✅ | ❌ underlines typos only — can't rewrite a clumsy sentence | free | scramble / Typollama | local LLM ✅ | nowhere ✅ | ❌ whole-text replacement or popup | free | inline-scribe | local LLM ✅ | nowhere ✅ | ✅ | free | This is the one I most want to share. The intuitive move is to ask the model for structured output: "return the changes as JSON," something like { "delete": "...", "insert": "..." }, ... , and pipe it straight into the UI. But small local models break when you do this. A model like llama3.2 3B is surprisingly good at fixing prose and terrible at structured output : it breaks the JSON, adds explanations, wraps everything in a code fence, renames your keys. A chatty 3B model means a broken UI. So I split the responsibilities: original, corrected with a you press Alt+G in a text field │ ▼ the extension sends your text to YOUR endpoint ← default: Ollama on 127.0.0.1 an OpenAI-compatible /chat/completions API model: llama3.2 ~2GB, free │ ▼ the model returns corrected prose — just text │ ▼ inline-scribe computes a word-level diff ← deterministic algorithm, between your text and the correction NOT the LLM's opinion │ ▼ review panel: accept ✓ / reject ✕ each change → Apply writes back only what you approved The diff tokenizes into words + whitespace + punctuation runs, then does an LCS longest common subsequence walk: // Tokenize into words/whitespace/punctuation, preserving everything export function tokenize text: string : string { return text.match /\s+| ^\s\w +|\w+/gu ?? ; } export function diffText original: string, corrected: string : Hunk { const a = tokenize original ; const b = tokenize corrected ; // DP table of LCS lengths over a × b Uint32Array rows // Walk the table emitting equal / delete / insert, merging adjacent ops. // Collapse delete+insert neighbours into one replace so a phrase rewrite // reads as a single reviewable hunk instead of three. ... } This split has a lot of happy side effects: { kind, original, corrected } . Accepted hunks take corrected , rejected take original , concatenate — done. export function applyDecisions hunks: Hunk , accepted: boolean : string { let result = ''; hunks.forEach h, idx = { if h.kind === 'equal' result += h.original; else result += accepted idx ? h.corrected : h.original; } ; return result; } Even with "return only text," small models still wrap output in fences or quotes. I gave up on prompting that away and instead strip the obvious wrappers in post — without touching real content: js export function stripWrapping reply: string, original: string : string { let out = reply.replace /\r\n/g, '\n' ; const fence = out.match /^\s a-z \n \s\S ? \n \s $/ ; if fence out = fence 1 ; // strip ... out = out.replace /^\s+|\s+$/g, '' ; if /^". "$/s.test out && /^"/.test original.trim out = out.slice 1, -1 ; // strip whole-reply quotes if original.endsWith '\n' && out.endsWith '\n' out += '\n'; // preserve trailing-newline convention return out; } Takeaway: let small local models do what they're good at return natural language and keep the structure — diffs, JSON, state — in deterministic code. This isn't specific to proofreading; it's a general principle for putting a local LLM into a product. This is the pothole every Chrome-extension × Ollama project hits. Stock Ollama rejects requests carrying a chrome-extension://... Origin with a 403 — a guard against cross-origin access from extensions. The official workaround is to have the user set the OLLAMA ORIGINS env var. But asking for that means: OLLAMA ORIGINS=chrome-extension://