# Grammarly costs $12/mo — a local LLM does it for free (Chrome + Ollama)

> Source: <https://dev.to/mk668a/grammarly-costs-12mo-a-local-llm-does-it-for-free-chrome-ollama-128a>
> Published: 2026-06-15 13:15:18+00:00

I write a lot in the browser — email, GitHub comments, contact forms — and I wanted proofreading without uploading every keystroke to a company's cloud. My workplace bans Grammarly for exactly that reason.

So I built [inline-scribe](https://github.com/mk668a/inline-scribe): a Chrome extension that proofreads your text with an AI that runs **on your own machine** (Ollama). Nothing leaves your computer. And the fixes show up like Word's Track Changes — **accept or reject each one individually** with ✓ / ✕.

This post is about the two design decisions I found most interesting while building it. Both generalize to anyone wiring a local LLM into a product.

`declarativeNetRequest`

`OLLAMA_ORIGINS`

.If you write in a browser today, you pick one of three bad options:

The thing is, the AI isn't the hard part anymore. Anyone can run a capable model locally with [Ollama](https://ollama.com) in two commands, for free. What's missing is the **interface**. The reason Grammarly was worth paying for was never the grammar engine — it was the *friendly diff* that lets you see and control each change.

That interface, on top of a model you own, is the whole product.

| corrections | your text goes to | inline diff, per-fix accept/reject | price | |
|---|---|---|---|---|
Grammarly |
cloud AI | their servers | ✅ (the reason people pay) | $12+/mo |
Harper (10k★) |
local, rule-based | nowhere ✅ | ❌ underlines typos only — can't rewrite a clumsy sentence | free |
scramble / Typollama |
local LLM ✅ | nowhere ✅ | ❌ whole-text replacement or popup | free |
inline-scribe |
local LLM ✅ | nowhere ✅ | ✅ | free |

This is the one I most want to share.

The intuitive move is to ask the model for structured output: "return the changes as JSON," something like `[{ "delete": "...", "insert": "..." }, ...]`

, and pipe it straight into the UI.

**But small local models break when you do this.** A model like llama3.2 (3B) is surprisingly good at *fixing prose* and terrible at *structured output*: it breaks the JSON, adds explanations, wraps everything in a code fence, renames your keys. A chatty 3B model means a broken UI.

So I split the responsibilities:

`(original, corrected)`

with a 

```
you press Alt+G in a text field
   │
   ▼
the extension sends your text to YOUR endpoint     ← default: Ollama on 127.0.0.1
(an OpenAI-compatible /chat/completions API)          model: llama3.2 (~2GB, free)
   │
   ▼
the model returns corrected prose — just text
   │
   ▼
inline-scribe computes a word-level diff           ← deterministic algorithm,
between your text and the correction                  NOT the LLM's opinion
   │
   ▼
review panel: accept ✓ / reject ✕ each change → Apply writes back only what you approved
```

The diff tokenizes into words + whitespace + punctuation runs, then does an **LCS (longest common subsequence)** walk:

```
// Tokenize into words/whitespace/punctuation, preserving everything
export function tokenize(text: string): string[] {
  return text.match(/\s+|[^\s\w]+|\w+/gu) ?? [];
}

export function diffText(original: string, corrected: string): Hunk[] {
  const a = tokenize(original);
  const b = tokenize(corrected);
  // DP table of LCS lengths over a × b (Uint32Array rows)
  // Walk the table emitting equal / delete / insert, merging adjacent ops.
  // Collapse delete+insert neighbours into one `replace` so a phrase rewrite
  // reads as a single reviewable hunk instead of three.
  ...
}
```

This split has a lot of happy side effects:

`{ kind, original, corrected }`

. Accepted hunks take `corrected`

, rejected take `original`

, concatenate — done.

```
export function applyDecisions(hunks: Hunk[], accepted: boolean[]): string {
  let result = '';
  hunks.forEach((h, idx) => {
    if (h.kind === 'equal') result += h.original;
    else result += accepted[idx] ? h.corrected : h.original;
  });
  return result;
}
```

Even with "return only text," small models still wrap output in fences or quotes. I gave up on prompting that away and instead strip the obvious wrappers in post — without touching real content:

``` js
export function stripWrapping(reply: string, original: string): string {
  let out = reply.replace(/\r\n/g, '\n');
  const fence = out.match(/^\s*```[a-z]*\n([\s\S]*?)\n```\s*$/);
  if (fence) out = fence[1];                       // strip ```...```
  out = out.replace(/^\s+|\s+$/g, '');
  if (/^".*"$/s.test(out) && !/^"/.test(original.trim())) out = out.slice(1, -1); // strip whole-reply quotes
  if (original.endsWith('\n') && !out.endsWith('\n')) out += '\n'; // preserve trailing-newline convention
  return out;
}
```

**Takeaway: let small local models do what they're good at (return natural language) and keep the structure — diffs, JSON, state — in deterministic code.** This isn't specific to proofreading; it's a general principle for putting a local LLM into a product.

This is the pothole every Chrome-extension × Ollama project hits.

Stock Ollama rejects requests carrying a `chrome-extension://...`

Origin with a **403** — a guard against cross-origin access from extensions. The official workaround is to have the user set the `OLLAMA_ORIGINS`

env var. But asking for that means:

`OLLAMA_ORIGINS=chrome-extension://<id-that-changes>`

In other words, **the moment your README documents an env var, you've lost.** It should just work with a stock `ollama serve`

.

The fix: use MV3's **declarativeNetRequest (DNR)** to **strip the Origin header** from requests to the configured endpoint with a dynamic rule. No Origin, no 403.

``` js
async function syncOriginRule(): Promise<void> {
  const stored = await chrome.storage.sync.get('config');
  const endpoint = stored.config?.endpoint ?? DEFAULT_CONFIG.endpoint;
  const host = new URL(endpoint).host; // e.g. 127.0.0.1:11434

  await chrome.declarativeNetRequest.updateDynamicRules({
    removeRuleIds: [1],
    addRules: [{
      id: 1,
      priority: 1,
      condition: { urlFilter: `||${host}/`, resourceTypes: ['xmlhttprequest'] },
      action: {
        type: 'modifyHeaders',
        requestHeaders: [{ header: 'origin', operation: 'remove' }],
      },
    }],
  });
}
```

Key points:

`urlFilter`

). It's not a dangerous "strip Origin everywhere" rule.`chrome.storage.onChanged`

and `declarativeNetRequest`

plus localhost `host_permissions`

.

```
// manifest.json (excerpt)
"permissions": ["storage", "activeTab", "declarativeNetRequest", "contextMenus"],
"host_permissions": ["http://127.0.0.1/*", "http://localhost/*"],
```

The result: the user's steps are "install Ollama, `ollama serve`

, install the extension." Zero env vars.

One more thing. The actual `fetch`

(the request to 127.0.0.1) happens **in the service worker, not the content script**:

```
// content → background message; background runs the check and replies
chrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => {
  if (msg?.type !== 'inline-scribe:check') return undefined;
  (async () => {
    const config = { ...DEFAULT_CONFIG, ...(await chrome.storage.sync.get('config')).config };
    try {
      const corrected = await new OllamaChecker(config).check(msg.text);
      sendResponse({ ok: true, corrected, model: config.model });
    } catch (err) {
      sendResponse({ ok: false, error: /* CheckerError message */ });
    }
  })();
  return true; // keep the channel open for the async response
});
```

Two reasons:

`fetch`

from a content script gets blocked when the page's Content-Security-Policy restricts `connect-src`

. The service worker runs in the extension's context and is unaffected.`xmlhttprequest`

, so the rule above applies cleanly.The UI (review panel, the ✎ selection icon, in-place replacement) is rendered in a shadow DOM from the content script so it doesn't collide with the page's CSS.

inline-scribe is, at its core, "Grammarly's diff UX on top of a local LLM." The design decisions that made it work:

`OLLAMA_ORIGINS`

, zero config.Swap the system prompt and the same diff UI becomes a translator or a tone-shifter. Source is MIT.

If you're putting a local LLM into a product, the leverage is in deciding what the model does — and what it doesn't.