{"slug": "grammarly-costs-12-mo-a-local-llm-does-it-for-free-chrome-ollama", "title": "Grammarly costs $12/mo — a local LLM does it for free (Chrome + Ollama)", "summary": "A developer built inline-scribe, a Chrome extension that proofreads text using a local LLM via Ollama, ensuring privacy by keeping data on the user's machine. The extension provides an inline diff interface similar to Grammarly's Track Changes, allowing users to accept or reject each correction individually. The project highlights a design split where the LLM returns corrected prose as plain text, and a deterministic algorithm computes the diff, avoiding the unreliability of structured output from small models.", "body_md": "I write a lot in the browser — email, GitHub comments, contact forms — and I wanted proofreading without uploading every keystroke to a company's cloud. My workplace bans Grammarly for exactly that reason.\n\nSo I built [inline-scribe](https://github.com/mk668a/inline-scribe): a Chrome extension that proofreads your text with an AI that runs **on your own machine** (Ollama). Nothing leaves your computer. And the fixes show up like Word's Track Changes — **accept or reject each one individually** with ✓ / ✕.\n\nThis post is about the two design decisions I found most interesting while building it. Both generalize to anyone wiring a local LLM into a product.\n\n`declarativeNetRequest`\n\n`OLLAMA_ORIGINS`\n\n.If you write in a browser today, you pick one of three bad options:\n\nThe thing is, the AI isn't the hard part anymore. Anyone can run a capable model locally with [Ollama](https://ollama.com) in two commands, for free. What's missing is the **interface**. The reason Grammarly was worth paying for was never the grammar engine — it was the *friendly diff* that lets you see and control each change.\n\nThat interface, on top of a model you own, is the whole product.\n\n| corrections | your text goes to | inline diff, per-fix accept/reject | price | |\n|---|---|---|---|---|\nGrammarly |\ncloud AI | their servers | ✅ (the reason people pay) | $12+/mo |\nHarper (10k★) |\nlocal, rule-based | nowhere ✅ | ❌ underlines typos only — can't rewrite a clumsy sentence | free |\nscramble / Typollama |\nlocal LLM ✅ | nowhere ✅ | ❌ whole-text replacement or popup | free |\ninline-scribe |\nlocal LLM ✅ | nowhere ✅ | ✅ | free |\n\nThis is the one I most want to share.\n\nThe intuitive move is to ask the model for structured output: \"return the changes as JSON,\" something like `[{ \"delete\": \"...\", \"insert\": \"...\" }, ...]`\n\n, and pipe it straight into the UI.\n\n**But small local models break when you do this.** A model like llama3.2 (3B) is surprisingly good at *fixing prose* and terrible at *structured output*: it breaks the JSON, adds explanations, wraps everything in a code fence, renames your keys. A chatty 3B model means a broken UI.\n\nSo I split the responsibilities:\n\n`(original, corrected)`\n\nwith a \n\n```\nyou press Alt+G in a text field\n   │\n   ▼\nthe extension sends your text to YOUR endpoint     ← default: Ollama on 127.0.0.1\n(an OpenAI-compatible /chat/completions API)          model: llama3.2 (~2GB, free)\n   │\n   ▼\nthe model returns corrected prose — just text\n   │\n   ▼\ninline-scribe computes a word-level diff           ← deterministic algorithm,\nbetween your text and the correction                  NOT the LLM's opinion\n   │\n   ▼\nreview panel: accept ✓ / reject ✕ each change → Apply writes back only what you approved\n```\n\nThe diff tokenizes into words + whitespace + punctuation runs, then does an **LCS (longest common subsequence)** walk:\n\n```\n// Tokenize into words/whitespace/punctuation, preserving everything\nexport function tokenize(text: string): string[] {\n  return text.match(/\\s+|[^\\s\\w]+|\\w+/gu) ?? [];\n}\n\nexport function diffText(original: string, corrected: string): Hunk[] {\n  const a = tokenize(original);\n  const b = tokenize(corrected);\n  // DP table of LCS lengths over a × b (Uint32Array rows)\n  // Walk the table emitting equal / delete / insert, merging adjacent ops.\n  // Collapse delete+insert neighbours into one `replace` so a phrase rewrite\n  // reads as a single reviewable hunk instead of three.\n  ...\n}\n```\n\nThis split has a lot of happy side effects:\n\n`{ kind, original, corrected }`\n\n. Accepted hunks take `corrected`\n\n, rejected take `original`\n\n, concatenate — done.\n\n```\nexport function applyDecisions(hunks: Hunk[], accepted: boolean[]): string {\n  let result = '';\n  hunks.forEach((h, idx) => {\n    if (h.kind === 'equal') result += h.original;\n    else result += accepted[idx] ? h.corrected : h.original;\n  });\n  return result;\n}\n```\n\nEven with \"return only text,\" small models still wrap output in fences or quotes. I gave up on prompting that away and instead strip the obvious wrappers in post — without touching real content:\n\n``` js\nexport function stripWrapping(reply: string, original: string): string {\n  let out = reply.replace(/\\r\\n/g, '\\n');\n  const fence = out.match(/^\\s*```[a-z]*\\n([\\s\\S]*?)\\n```\\s*$/);\n  if (fence) out = fence[1];                       // strip ```...```\n  out = out.replace(/^\\s+|\\s+$/g, '');\n  if (/^\".*\"$/s.test(out) && !/^\"/.test(original.trim())) out = out.slice(1, -1); // strip whole-reply quotes\n  if (original.endsWith('\\n') && !out.endsWith('\\n')) out += '\\n'; // preserve trailing-newline convention\n  return out;\n}\n```\n\n**Takeaway: let small local models do what they're good at (return natural language) and keep the structure — diffs, JSON, state — in deterministic code.** This isn't specific to proofreading; it's a general principle for putting a local LLM into a product.\n\nThis is the pothole every Chrome-extension × Ollama project hits.\n\nStock Ollama rejects requests carrying a `chrome-extension://...`\n\nOrigin with a **403** — a guard against cross-origin access from extensions. The official workaround is to have the user set the `OLLAMA_ORIGINS`\n\nenv var. But asking for that means:\n\n`OLLAMA_ORIGINS=chrome-extension://<id-that-changes>`\n\nIn other words, **the moment your README documents an env var, you've lost.** It should just work with a stock `ollama serve`\n\n.\n\nThe fix: use MV3's **declarativeNetRequest (DNR)** to **strip the Origin header** from requests to the configured endpoint with a dynamic rule. No Origin, no 403.\n\n``` js\nasync function syncOriginRule(): Promise<void> {\n  const stored = await chrome.storage.sync.get('config');\n  const endpoint = stored.config?.endpoint ?? DEFAULT_CONFIG.endpoint;\n  const host = new URL(endpoint).host; // e.g. 127.0.0.1:11434\n\n  await chrome.declarativeNetRequest.updateDynamicRules({\n    removeRuleIds: [1],\n    addRules: [{\n      id: 1,\n      priority: 1,\n      condition: { urlFilter: `||${host}/`, resourceTypes: ['xmlhttprequest'] },\n      action: {\n        type: 'modifyHeaders',\n        requestHeaders: [{ header: 'origin', operation: 'remove' }],\n      },\n    }],\n  });\n}\n```\n\nKey points:\n\n`urlFilter`\n\n). It's not a dangerous \"strip Origin everywhere\" rule.`chrome.storage.onChanged`\n\nand `declarativeNetRequest`\n\nplus localhost `host_permissions`\n\n.\n\n```\n// manifest.json (excerpt)\n\"permissions\": [\"storage\", \"activeTab\", \"declarativeNetRequest\", \"contextMenus\"],\n\"host_permissions\": [\"http://127.0.0.1/*\", \"http://localhost/*\"],\n```\n\nThe result: the user's steps are \"install Ollama, `ollama serve`\n\n, install the extension.\" Zero env vars.\n\nOne more thing. The actual `fetch`\n\n(the request to 127.0.0.1) happens **in the service worker, not the content script**:\n\n```\n// content → background message; background runs the check and replies\nchrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => {\n  if (msg?.type !== 'inline-scribe:check') return undefined;\n  (async () => {\n    const config = { ...DEFAULT_CONFIG, ...(await chrome.storage.sync.get('config')).config };\n    try {\n      const corrected = await new OllamaChecker(config).check(msg.text);\n      sendResponse({ ok: true, corrected, model: config.model });\n    } catch (err) {\n      sendResponse({ ok: false, error: /* CheckerError message */ });\n    }\n  })();\n  return true; // keep the channel open for the async response\n});\n```\n\nTwo reasons:\n\n`fetch`\n\nfrom a content script gets blocked when the page's Content-Security-Policy restricts `connect-src`\n\n. The service worker runs in the extension's context and is unaffected.`xmlhttprequest`\n\n, so the rule above applies cleanly.The UI (review panel, the ✎ selection icon, in-place replacement) is rendered in a shadow DOM from the content script so it doesn't collide with the page's CSS.\n\ninline-scribe is, at its core, \"Grammarly's diff UX on top of a local LLM.\" The design decisions that made it work:\n\n`OLLAMA_ORIGINS`\n\n, zero config.Swap the system prompt and the same diff UI becomes a translator or a tone-shifter. Source is MIT.\n\nIf you're putting a local LLM into a product, the leverage is in deciding what the model does — and what it doesn't.", "url": "https://wpnews.pro/news/grammarly-costs-12-mo-a-local-llm-does-it-for-free-chrome-ollama", "canonical_source": "https://dev.to/mk668a/grammarly-costs-12mo-a-local-llm-does-it-for-free-chrome-ollama-128a", "published_at": "2026-06-15 13:15:18+00:00", "updated_at": "2026-06-15 13:36:55.143537+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "ai-products"], "entities": ["Grammarly", "Ollama", "inline-scribe", "Chrome", "llama3.2", "Harper", "scramble", "Typollama"], "alternates": {"html": "https://wpnews.pro/news/grammarly-costs-12-mo-a-local-llm-does-it-for-free-chrome-ollama", "markdown": "https://wpnews.pro/news/grammarly-costs-12-mo-a-local-llm-does-it-for-free-chrome-ollama.md", "text": "https://wpnews.pro/news/grammarly-costs-12-mo-a-local-llm-does-it-for-free-chrome-ollama.txt", "jsonld": "https://wpnews.pro/news/grammarly-costs-12-mo-a-local-llm-does-it-for-free-chrome-ollama.jsonld"}}