cd /news/large-language-models/grammarly-costs-12-mo-a-local-llm-do… · home topics large-language-models article
[ARTICLE · art-27993] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Grammarly costs $12/mo — a local LLM does it for free (Chrome + Ollama)

A developer built inline-scribe, a Chrome extension that proofreads text using a local LLM via Ollama, ensuring privacy by keeping data on the user's machine. The extension provides an inline diff interface similar to Grammarly's Track Changes, allowing users to accept or reject each correction individually. The project highlights a design split where the LLM returns corrected prose as plain text, and a deterministic algorithm computes the diff, avoiding the unreliability of structured output from small models.

read6 min publishedJun 15, 2026

I write a lot in the browser — email, GitHub comments, contact forms — and I wanted proofreading without up every keystroke to a company's cloud. My workplace bans Grammarly for exactly that reason.

So I built inline-scribe: a Chrome extension that proofreads your text with an AI that runs on your own machine (Ollama). Nothing leaves your computer. And the fixes show up like Word's Track Changes — accept or reject each one individually with ✓ / ✕.

This post is about the two design decisions I found most interesting while building it. Both generalize to anyone wiring a local LLM into a product.

declarativeNetRequest

OLLAMA_ORIGINS

.If you write in a browser today, you pick one of three bad options:

The thing is, the AI isn't the hard part anymore. Anyone can run a capable model locally with Ollama in two commands, for free. What's missing is the interface. The reason Grammarly was worth paying for was never the grammar engine — it was the friendly diff that lets you see and control each change.

That interface, on top of a model you own, is the whole product.

corrections your text goes to inline diff, per-fix accept/reject price
Grammarly
cloud AI their servers ✅ (the reason people pay) $12+/mo
Harper (10k★)
local, rule-based nowhere ✅ ❌ underlines typos only — can't rewrite a clumsy sentence free
scramble / Typollama
local LLM ✅ nowhere ✅ ❌ whole-text replacement or popup free
inline-scribe
local LLM ✅ nowhere ✅ free

This is the one I most want to share.

The intuitive move is to ask the model for structured output: "return the changes as JSON," something like [{ "delete": "...", "insert": "..." }, ...]

, and pipe it straight into the UI.

But small local models break when you do this. A model like llama3.2 (3B) is surprisingly good at fixing prose and terrible at structured output: it breaks the JSON, adds explanations, wraps everything in a code fence, renames your keys. A chatty 3B model means a broken UI.

So I split the responsibilities:

(original, corrected)

with a

you press Alt+G in a text field
   │
   ▼
the extension sends your text to YOUR endpoint     ← default: Ollama on 127.0.0.1
(an OpenAI-compatible /chat/completions API)          model: llama3.2 (~2GB, free)
   │
   ▼
the model returns corrected prose — just text
   │
   ▼
inline-scribe computes a word-level diff           ← deterministic algorithm,
between your text and the correction                  NOT the LLM's opinion
   │
   ▼
review panel: accept ✓ / reject ✕ each change → Apply writes back only what you approved

The diff tokenizes into words + whitespace + punctuation runs, then does an LCS (longest common subsequence) walk:

// Tokenize into words/whitespace/punctuation, preserving everything
export function tokenize(text: string): string[] {
  return text.match(/\s+|[^\s\w]+|\w+/gu) ?? [];
}

export function diffText(original: string, corrected: string): Hunk[] {
  const a = tokenize(original);
  const b = tokenize(corrected);
  // DP table of LCS lengths over a × b (Uint32Array rows)
  // Walk the table emitting equal / delete / insert, merging adjacent ops.
  // Collapse delete+insert neighbours into one `replace` so a phrase rewrite
  // reads as a single reviewable hunk instead of three.
  ...
}

This split has a lot of happy side effects:

{ kind, original, corrected }

. Accepted hunks take corrected

, rejected take original

, concatenate — done.

export function applyDecisions(hunks: Hunk[], accepted: boolean[]): string {
  let result = '';
  hunks.forEach((h, idx) => {
    if (h.kind === 'equal') result += h.original;
    else result += accepted[idx] ? h.corrected : h.original;
  });
  return result;
}

Even with "return only text," small models still wrap output in fences or quotes. I gave up on prompting that away and instead strip the obvious wrappers in post — without touching real content:

export function stripWrapping(reply: string, original: string): string {
  let out = reply.replace(/\r\n/g, '\n');
  const fence = out.match(/^\s*```[a-z]*\n([\s\S]*?)\n```\s*$/);
  if (fence) out = fence[1];                       // strip ```...```
  out = out.replace(/^\s+|\s+$/g, '');
  if (/^".*"$/s.test(out) && !/^"/.test(original.trim())) out = out.slice(1, -1); // strip whole-reply quotes
  if (original.endsWith('\n') && !out.endsWith('\n')) out += '\n'; // preserve trailing-newline convention
  return out;
}

Takeaway: let small local models do what they're good at (return natural language) and keep the structure — diffs, JSON, state — in deterministic code. This isn't specific to proofreading; it's a general principle for putting a local LLM into a product.

This is the pothole every Chrome-extension × Ollama project hits.

Stock Ollama rejects requests carrying a chrome-extension://...

Origin with a 403 — a guard against cross-origin access from extensions. The official workaround is to have the user set the OLLAMA_ORIGINS

env var. But asking for that means:

OLLAMA_ORIGINS=chrome-extension://<id-that-changes>

In other words, the moment your README documents an env var, you've lost. It should just work with a stock ollama serve

.

The fix: use MV3's declarativeNetRequest (DNR) to strip the Origin header from requests to the configured endpoint with a dynamic rule. No Origin, no 403.

async function syncOriginRule(): Promise<void> {
  const stored = await chrome.storage.sync.get('config');
  const endpoint = stored.config?.endpoint ?? DEFAULT_CONFIG.endpoint;
  const host = new URL(endpoint).host; // e.g. 127.0.0.1:11434

  await chrome.declarativeNetRequest.updateDynamicRules({
    removeRuleIds: [1],
    addRules: [{
      id: 1,
      priority: 1,
      condition: { urlFilter: `||${host}/`, resourceTypes: ['xmlhttprequest'] },
      action: {
        type: 'modifyHeaders',
        requestHeaders: [{ header: 'origin', operation: 'remove' }],
      },
    }],
  });
}

Key points:

urlFilter

). It's not a dangerous "strip Origin everywhere" rule.chrome.storage.onChanged

and declarativeNetRequest

plus localhost host_permissions

.

// manifest.json (excerpt)
"permissions": ["storage", "activeTab", "declarativeNetRequest", "contextMenus"],
"host_permissions": ["http://127.0.0.1/*", "http://localhost/*"],

The result: the user's steps are "install Ollama, ollama serve

, install the extension." Zero env vars.

One more thing. The actual fetch

(the request to 127.0.0.1) happens in the service worker, not the content script:

// content → background message; background runs the check and replies
chrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => {
  if (msg?.type !== 'inline-scribe:check') return undefined;
  (async () => {
    const config = { ...DEFAULT_CONFIG, ...(await chrome.storage.sync.get('config')).config };
    try {
      const corrected = await new OllamaChecker(config).check(msg.text);
      sendResponse({ ok: true, corrected, model: config.model });
    } catch (err) {
      sendResponse({ ok: false, error: /* CheckerError message */ });
    }
  })();
  return true; // keep the channel open for the async response
});

Two reasons:

fetch

from a content script gets blocked when the page's Content-Security-Policy restricts connect-src

. The service worker runs in the extension's context and is unaffected.xmlhttprequest

, so the rule above applies cleanly.The UI (review panel, the ✎ selection icon, in-place replacement) is rendered in a shadow DOM from the content script so it doesn't collide with the page's CSS.

inline-scribe is, at its core, "Grammarly's diff UX on top of a local LLM." The design decisions that made it work:

OLLAMA_ORIGINS

, zero config.Swap the system prompt and the same diff UI becomes a translator or a tone-shifter. Source is MIT.

If you're putting a local LLM into a product, the leverage is in deciding what the model does — and what it doesn't.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/grammarly-costs-12-m…] indexed:0 read:6min 2026-06-15 ·