PocketCFO: a private personal-finance brain that runs entirely in your browser PocketCFO is a single-page web application that uses the Gemma 4 AI model to analyze personal finances entirely within a user's browser, ensuring no financial data ever leaves the local machine. The tool processes paper receipts, bank statements, and natural language queries by splitting tasks between the AI for categorization and a separate analytics module for precise mathematical calculations. To balance privacy, latency, and performance, the app offers users a choice between three Gemma 4 model tiers, with the default E2B variant providing a practical compromise at roughly 1.5GB in size. Snap a paper receipt, drop a bank statement, or just ask a question. Gemma 4 does the rest — without a single byte ever leaving your tab. Live demo: https://gemma-challenge.vercel.app/ Code: https://github.com/chintanonweb/gemma-challenge Personal-finance apps are a usability disaster for privacy-minded people. To get useful insights from your bank statement — categorizing spend, spotting forgotten subscriptions, asking "how much did I spend on coffee?" — you have to upload your full transaction history to a third party. Often more: bank credentials via Plaid, receipts via your phone gallery, voice memos via some cloud transcription API. Most people don't. Most people shouldn't. So the insight never gets generated, and the forgotten subscription keeps charging. I wanted to know if there was now a way to fix this — a personal finance tool that does real work but where every byte stays on the user's machine. Until 2026 the answer was "almost, but not quite." With Gemma 4 E2B the answer is finally yes, in a browser tab. PocketCFO is a single-page web app where you: Everything runs in the browser. The only thing the server does is host static files. Open your devtools network tab during use; after the initial ~1.5GB model download cached , nothing leaves your machine. Instead of hardcoding a single model, PocketCFO ships a picker covering three deployment tiers of the same model family. The "intentional model selection" judging criterion isn't just a justification I write into the post — it's a user feature that exposes the actual trade-off: The default is E2B because most users meet PocketCFO for the first time on a normal connection and don't want to wait through a 2.5 GB download before seeing anything. The big win is that every option uses the same Gemma 4 family with the same 128K context, so the product behaves the same; what changes is the user's chosen balance between privacy, latency, and quality. PocketCFO has four non-negotiable constraints, and E2B is the smallest Gemma 4 variant that meets all four: The 31B Dense and 26B MoE Gemma 4 variants are too large for browser inference today. The E4B variant is more capable but ~2.5GB to download — painfully slow for a first-time user trying the demo. E2B hits the sweet spot: same multimodality, same 128K context, roughly half the cold-load time. Respecting the user's bandwidth turned out to matter more for product feel than the marginal reasoning gain of the bigger model. Crucially, the multimodality is what makes the model choice non-trivial. Without the vision encoder, the receipt-snap flow doesn't work and the project collapses to a text-only tool that Gemma 3 could have done. With it, every receipt scan and statement question runs through the same ~1.5GB of weights — downloaded once, never uploaded. The single most important architectural decision in PocketCFO is this: The LLM categorizes and reasons. A boring analytics/ module does all the math. When a finance tool says "you spent $487 on subscriptions this year," that number had better be right. LLMs hallucinate sums constantly — even good ones, even with explicit chain-of-thought — and they do it most often in exactly the situations where you'd put one in front of a user long contexts, lots of numbers, "summarize this for me" . I would not ship a demo that adds $14 + $9.99 and shows $24. So the split is: engine/ Gemma 4 outputs labels: a category word, a merchant name, a free-form answer.analytics/ pure functions, 100% test coverage outputs numbers: totals, percentages, recurring-payment detection, month-over-month deltas.The recurring-charge detection in particular is purely deterministic: group by merchant, compute gap distribution, snap to weekly/monthly/quarterly/yearly. Three unit tests cover the cadence math, two cover the edge cases single-month input, income exclusion . The LLM never enters that code path. The number on the dashboard is correct by construction. 1. Transformers.js needs to be on v4.0.1+ for Gemma 4. I started on v3.5 and got Unsupported model type: gemma4 the first time the model tried to load. Gemma 4 support didn't land in @huggingface/transformers until v4.0.1. Easy fix once you know — but a reminder that ^ semver ranges on bleeding-edge libraries can silently leave you behind. The TypeScript types for pipeline are still too complex to resolve cleanly, so I wrapped the call in a narrow cast; runtime behavior is fine. 2. Cross-Origin-Isolation is a Vercel-deploy footgun. Multi-threaded WebAssembly inside Transformers.js needs SharedArrayBuffer , which needs Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers. These are easy to add in next.config.ts but easy to forget — the build will pass, the deploy will succeed, the demo will silently fall back to slow single-threaded WASM. Test in incognito after deploying. 3. Streaming Q&A makes the demo feel real; per-token categorization doesn't. The Q&A panel uses TextStreamer so the answer types out character by character — feels alive. For categorization 60 transactions × short outputs , sequential non-streamed calls + UI pills lighting up one at a time also feels alive. Both feel like the model is working; neither needs the same engineering. Pick the streaming hill you actually want to die on. If you build something on Gemma 4 too, I'd love to see it. — Chintan