I Built ContextFabric: One Private Memory Layer Across Claude, ChatGPT, Cursor, and More with Local Gemma 4

Based solely on the provided text, the article describes the creation of **ContextFabric**, a local AI memory layer powered by Gemma 4 that allows users to share portable, permissioned context across different AI tools like Claude and ChatGPT. The system uses a desktop app, local daemon, and browser extension to extract and store durable memory nodes (such as projects and decisions) in a local SQLite graph, ensuring user data remains private and is not sent to a cloud server. The project is built with technologies including Electron, React, and Ollama, and is submitted for the Gemma 4 Challenge.

This is a submission for the Gemma 4 Challenge: Build with Gemma 4 AI tools remember now, but they remember in separate silos. Claude has projects, ChatGPT has personalization, Cursor indexes your codebase, and somehow you still end up re-explaining the same decisions, constraints, preferences, and project state every time you move between tools. That felt backwards to me. If memory is becoming part of the AI operating system, then personal context should not be trapped inside one vendor's product. It should be portable, permissioned, local-first, and owned by the user. So I built ContextFabric : a local AI memory layer powered by Gemma 4 . What I Built ContextFabric is a desktop app, local daemon, memory graph, and browser extension bridge that lets AI tools share approved context without sending your personal memory to a cloud memory server. The idea is simple: - Import your real project context: repos, folders, markdown, PDFs, ChatGPT exports, Claude exports, notes, and documents. - Gemma 4 runs locally through Ollama and extracts structured memory nodes. - ContextFabric stores those nodes in a local SQLite graph. - External tools request access. - You approve the request. - The browser extension injects the right context into Claude, ChatGPT, Cursor, Gemini, Perplexity, and other AI tools. The five core memory node types are: - project : what you are building - decision : choices already made and why - preference : stable working preferences - style : how you communicate, design, or code - person : collaborators and relevant human context This is not meant to replace Claude projects, ChatGPT memory, or Cursor indexing. It solves a different problem: your context should be portable across them . Demo Browser extension injection: The demo shows the full loop: - paste messy project context - Gemma 4 extracts structured memory nodes - nodes are saved locally with confidence scores - AI Query answers with sources - a permission request controls external access - the browser extension injects approved context into an AI chat tool Code GitHub: https://github.com/Boweii22/ContextFabric https://github.com/Boweii22/ContextFabric Live Site: https://boweii22.github.io/ContextFabric/ https://boweii22.github.io/ContextFabric/ The project is built with Electron, React, TypeScript, SQLite, Express, Ollama, and a Manifest V3 browser extension. The local app exposes two loopback APIs: - 127.0.0.1:47821 for the desktop app permission/token API - 127.0.0.1:7749 for the simple demo daemon UI and compatibility endpoints Both are bound to loopback, not 0.0.0.0 . That matters because the privacy claim is not just a paragraph in a README. The architecture does not expose a public server for your memory graph. How I Built It The architecture has six parts: User-owned sources repos, exports, docs, notes, PDFs | v Local ingestion chunking + metadata | v Gemma 4 via Ollama extract + reason | v SQLite memory graph nodes + embeddings | v Permissioned daemon localhost only | v Browser extension injects context The first hard problem was extraction. I did not want a generic summary. I wanted durable memory. That means the model has to decide whether a piece of text contains a project fact, a decision, a preference, a style signal, or a person. Here is the actual extraction schema prompt from the project: js export const CONTEXT NODE TYPES = 'project', 'style', 'decision', 'preference', 'person' as const export const CONTEXT EXTRACTION SYSTEM PROMPT = You are ContextFabric's local Gemma 4 context extractor. Your job is to read one piece of user-owned context and output ONLY valid JSON. No markdown. No prose. No comments. No trailing commas. Extract durable context nodes that another AI assistant should remember later. Use only facts supported by the input. Do not invent people, projects, tools, or decisions. Allowed node types: - project: what the user is building, maintaining, researching, or planning. - style: how the user writes, communicates, designs, codes, or prefers answers to be shaped. - decision: a choice already made, including why, tradeoffs, rejected alternatives, or reversibility. - preference: a stable working preference, constraint, tool choice, privacy preference, format preference, or habit. - person: a collaborator, stakeholder, user, client, author, or named human with relevant relationship/role context. Return this exact JSON shape: { "nodes": { "type": "project" | "style" | "decision" | "preference" | "person", "title": "short human-readable title", "summary": "one factual sentence, max 220 characters", "confidence": 0.0, "evidence": "short direct evidence phrase from the input, max 180 characters", "entities": "important names, tools, projects, people" , "tags": "lowercase-keywords" } } The parser is intentionally defensive. Gemma 4 is good at structured output, but production code still needs repair paths. export function parseContextExtraction raw: string : ContextExtractionParseResult { const errors: string = const parsed = parseJsonObject raw if parsed || typeof parsed == 'object' || Array.isArray parsed { return { ok: false, result: { nodes: }, errors: 'Output is not a JSON object.' } } const root = parsed as Record<string, unknown if Array.isArray root.nodes { return { ok: false, result: { nodes: }, errors: 'Missing nodes array.' } } const nodes: ExtractedContextNode = for const index, value of root.nodes.entries { const node = normalizeNode value, index, errors if node nodes.push node } return { ok: errors.length === 0, result: { nodes: nodes.slice 0, 6 }, errors } } The second hard problem was assembling context for different tools. Claude, ChatGPT, and Cursor do not want the same payload. Claude benefits from concise prose sections. ChatGPT works well with a compact bullet brief. Cursor needs engineering-focused context. So ContextFabric asks Gemma 4 to assemble app-aware context briefs: js export const PAYLOAD ASSEMBLY SYSTEM PROMPT = You are ContextFabric's local Gemma 4 payload assembler. Goal: Turn user-approved local memory nodes into one coherent context brief for another AI tool. Rules: - Use ONLY the supplied memory nodes. Do not invent facts, names, features, dates, metrics, or claims. - Prefer stable project, decision, style, preference, and person nodes over raw conversation/code snippets. - Write a useful brief, not a JSON dump. - Include source node ids inline as node:id after concrete claims. - If the nodes do not support a requested claim, omit it. - Respect the requested app format. - Stay under the requested maximum word count. App formats: - claude: concise prose with sections "Context", "Decisions", "Working Style", "How to Use This". - chatgpt: short bullet-oriented brief with "Known Context", "Preferences", "Relevant Sources". - cursor: engineering-focused brief with "Project", "Architecture / Decisions", "Coding Preferences", "Files / Sources". - generic: compact neutral brief with clear source ids. Return JSON only: { "payload": "the final context brief", "usedNodeIds": "node-id" , "warnings": "optional warning when data is thin or uncertain" } The third hard problem was making the local model usable on normal hardware. I hit memory issues while testing Gemma locally, so ContextFabric creates a constrained Ollama profile called cf-gemma4 . js const res = await fetch ${this.baseUrl}/api/create , { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify { model: this.constrainedModelName, from: sourceModel, parameters: { num ctx: this.runtimeContext, num predict: 64, num batch: 4, }, stream: false, } , signal: ctrl.signal, } This was not about making the model weaker. It was about making the demo run on real laptops, not just on a perfect GPU workstation. For the local HTTP daemon, I added a small API that judges can test without understanding the whole Electron app: js compat.post '/extract', async req: Request, res: Response = { const { text, title = 'HTTP Extract', inputType = 'api', save = false } = req.body if text?.trim { res.status 400 .json { error: 'text is required' } return } const result = await extractNodesFromText db, ollama, text, title, inputType, Boolean save res.json { ok: true, saved: Boolean save , savedCount: result.savedCount, nodes: result.nodes.map nodeToPublicJson , } } compat.get '/context', async req: Request, res: Response = { const appId = String req.query.app || req.query.appId || 'generic' const query = String req.query.query || 'current project context, writing style, technical decisions, preferences' const nodes = selectTokenNodes db.getNodes 800 , query, 16 const assembly = await assembleTokenPayloadWithTimeout ollama, { appId, query, nodes, maxWords: 800 } res.json { ok: true, appFormat: assembly.appFormat, payload: assembly.payload } } That endpoint is what makes the browser extension bridge simple. The extension does not need to know how the graph works. It asks the local daemon for approved context and inserts it into the active AI chat box. Why Gemma 4 Gemma 4 is not a decorative dependency here. It is the part of the system that turns ContextFabric from a searchable note bucket into a memory protocol. I chose Gemma 4 E2B as the target model profile because ContextFabric is supposed to run where personal context actually lives: on laptops, desktops, and eventually smaller edge devices. A cloud model would have defeated the core privacy constraint. If your private context graph has to leave the machine for extraction, then the product becomes a privacy policy promise instead of a privacy-preserving architecture. A much larger local model could produce stronger answers, but it would make the product less usable for the people who need it most. The challenge specifically highlights small Gemma 4 models for edge and local use, and that is exactly the design space ContextFabric lives in. Gemma 4 plays three roles: 1. Context extraction It reads messy user-owned text and converts it into typed, durable memory nodes. This is different from summarization. A summary says "what was this text about?" Context extraction asks "what should another AI assistant remember later?" 2. Conflict detection If a new memory contradicts an existing one, Gemma 4 can mark the conflict or uncertainty. That matters because memory should not silently rot. For example, if an old preference says "prefer short answers" and a new note says "prefer detailed long answers", ContextFabric should surface that conflict instead of pretending both are equally true forever. 3. Payload assembly When Claude, ChatGPT, or Cursor asks for context, Gemma 4 turns relevant graph nodes into a coherent brief with citations and a word limit. This is where the model's reasoning is useful: not to invent project facts, but to decide how to package approved facts for another tool. The architecture also keeps Gemma 4 on the correct side of the trust boundary. The normal challenge path uses Ollama locally. The daemon binds to loopback. The database is local. The browser extension talks to localhost . There is no ContextFabric cloud memory service receiving your data. That is the difference between "we care about privacy" and "the data path cannot reach our server because there is no server in the path." The Bigger Picture I do not think the long-term version of this idea is just an app. I think it is protocol infrastructure. HTTP made documents portable across servers. SMTP made email portable across providers. ContextFabric is an early sketch of what a personal AI context protocol could look like. Today, every AI company is building memory as a product feature. That makes sense. Memory improves retention. But as developers, we should ask a harder question: Should personal AI context belong to the tool, or to the user? My answer is the user. That is why ContextFabric has permission requests, scoped grants, source citations, local storage, and a browser extension bridge. The extension is the adoption wedge: it makes the protocol useful before any AI company agrees to support it natively. That was the "I never thought of it that way" moment for me. The future of AI memory should not be one giant memory per vendor. It should be a user-controlled context layer that tools can request access to. The browser extension is the wedge. The protocol is the point. Challenges I Ran Into The hardest challenge was not building a chat UI. It was keeping the system honest. Early versions returned raw code chunks when I asked project-level questions. That was technically "retrieval", but it was bad memory. I had to improve ranking so durable nodes like project , decision , style , and preference win over random bundled JavaScript or CSS. The second challenge was local model reliability. Gemma 4 needs enough free memory, and normal laptops are messy. People have Chrome, VS Code, Docker, Discord, and ten other things open. That led to the constrained Ollama profile, shorter prompts, fallback parsing, and clearer error messages. The third challenge was browser injection. Claude, ChatGPT, Cursor, and Perplexity do not share one DOM structure. The extension has to find active inputs, avoid stale text areas, handle single-page-app navigation, and never crash the page if the daemon is offline. The fourth challenge was packaging. A project that only works on my machine is not a challenge submission. I added a one-command startup path, release assets, Chrome extension packaging, screenshots, and a GitHub Pages landing page. What's Next The next version is about turning the prototype into a real protocol. My roadmap: - publish the Chrome Web Store listing after review - add native macOS and Windows installers - improve LAN sync between devices - add richer conflict resolution workflows - publish a formal context payload schema - build SDKs so indie AI tools can request ContextFabric memory directly - explore a standard token format for scoped context grants The browser extension is useful now, but the bigger win is native integration. I want AI tools to request context the way apps request OAuth scopes, except the resource is not your Google Drive or GitHub account. It is your personal working context. Try It Yourself Repo: https://github.com/Boweii22/ContextFabric https://github.com/Boweii22/ContextFabric Live Site: https://boweii22.github.io/ContextFabric/ https://boweii22.github.io/ContextFabric/ Install Ollama: Then run: git clone https://github.com/Boweii22/ContextFabric.git cd ContextFabric npm run start On macOS, use Node 20 or 22: nvm install 20 nvm use 20 npm run start On Windows, use Node 20 via fnm or nvm-windows : fnm use 20 npm run start Open the local demo UI: http://127.0.0.1:7749/ui The fastest test: - Paste some project context into Extract Context . - Click Extract and save . - Watch Gemma 4 create typed memory nodes. - Open the Claude context preview. - Try the browser extension bridge. Built by Bowei Tombri for the DEV Gemma 4 Challenge. If you build with AI tools every day, I am curious: would you rather each tool keep its own memory, or would you prefer a local memory layer that every tool has to request permission from?