{"slug": "i-built-contextfabric-one-private-memory-layer-across-claude-chatgpt-cursor-and", "title": "I Built ContextFabric: One Private Memory Layer Across Claude, ChatGPT, Cursor, and More with Local Gemma 4", "summary": "Based solely on the provided text, the article describes the creation of **ContextFabric**, a local AI memory layer powered by Gemma 4 that allows users to share portable, permissioned context across different AI tools like Claude and ChatGPT. The system uses a desktop app, local daemon, and browser extension to extract and store durable memory nodes (such as projects and decisions) in a local SQLite graph, ensuring user data remains private and is not sent to a cloud server. The project is built with technologies including Electron, React, and Ollama, and is submitted for the Gemma 4 Challenge.", "body_md": "*This is a submission for the Gemma 4 Challenge: Build with Gemma 4*\n\nAI tools remember now, but they remember in separate silos. Claude has projects, ChatGPT has personalization, Cursor indexes your codebase, and somehow you still end up re-explaining the same decisions, constraints, preferences, and project state every time you move between tools.\n\nThat felt backwards to me.\n\nIf memory is becoming part of the AI operating system, then personal context should not be trapped inside one vendor's product. It should be portable, permissioned, local-first, and owned by the user.\n\nSo I built **ContextFabric**: a local AI memory layer powered by **Gemma 4**.\n\n## What I Built\n\nContextFabric is a desktop app, local daemon, memory graph, and browser extension bridge that lets AI tools share approved context without sending your personal memory to a cloud memory server.\n\nThe idea is simple:\n\n- Import your real project context: repos, folders, markdown, PDFs, ChatGPT exports, Claude exports, notes, and documents.\n- Gemma 4 runs locally through Ollama and extracts structured memory nodes.\n- ContextFabric stores those nodes in a local SQLite graph.\n- External tools request access.\n- You approve the request.\n- The browser extension injects the right context into Claude, ChatGPT, Cursor, Gemini, Perplexity, and other AI tools.\n\nThe five core memory node types are:\n\n-\n`project`\n\n: what you are building -\n`decision`\n\n: choices already made and why -\n`preference`\n\n: stable working preferences -\n`style`\n\n: how you communicate, design, or code -\n`person`\n\n: collaborators and relevant human context\n\nThis is not meant to replace Claude projects, ChatGPT memory, or Cursor indexing.\n\nIt solves a different problem: **your context should be portable across them**.\n\n## Demo\n\nBrowser extension injection:\n\nThe demo shows the full loop:\n\n- paste messy project context\n- Gemma 4 extracts structured memory nodes\n- nodes are saved locally with confidence scores\n- AI Query answers with sources\n- a permission request controls external access\n- the browser extension injects approved context into an AI chat tool\n\n## Code\n\nGitHub:\n\n[https://github.com/Boweii22/ContextFabric](https://github.com/Boweii22/ContextFabric)\n\nLive Site:\n\n[https://boweii22.github.io/ContextFabric/](https://boweii22.github.io/ContextFabric/)\n\nThe project is built with Electron, React, TypeScript, SQLite, Express, Ollama, and a Manifest V3 browser extension.\n\nThe local app exposes two loopback APIs:\n\n-\n`127.0.0.1:47821`\n\nfor the desktop app permission/token API -\n`127.0.0.1:7749`\n\nfor the simple demo daemon UI and compatibility endpoints\n\nBoth are bound to loopback, not `0.0.0.0`\n\n.\n\nThat matters because the privacy claim is not just a paragraph in a README. The architecture does not expose a public server for your memory graph.\n\n## How I Built It\n\nThe architecture has six parts:\n\n```\nUser-owned sources\nrepos, exports, docs, notes, PDFs\n        |\n        v\nLocal ingestion\nchunking + metadata\n        |\n        v\nGemma 4 via Ollama\nextract + reason\n        |\n        v\nSQLite memory graph\nnodes + embeddings\n        |\n        v\nPermissioned daemon\nlocalhost only\n        |\n        v\nBrowser extension\ninjects context\n```\n\nThe first hard problem was extraction.\n\nI did not want a generic summary. I wanted durable memory. That means the model has to decide whether a piece of text contains a project fact, a decision, a preference, a style signal, or a person.\n\nHere is the actual extraction schema prompt from the project:\n\n``` js\nexport const CONTEXT_NODE_TYPES = ['project', 'style', 'decision', 'preference', 'person'] as const\n\nexport const CONTEXT_EXTRACTION_SYSTEM_PROMPT = `You are ContextFabric's local Gemma 4 context extractor.\n\nYour job is to read one piece of user-owned context and output ONLY valid JSON.\nNo markdown. No prose. No comments. No trailing commas.\n\nExtract durable context nodes that another AI assistant should remember later.\nUse only facts supported by the input. Do not invent people, projects, tools, or decisions.\n\nAllowed node types:\n- project: what the user is building, maintaining, researching, or planning.\n- style: how the user writes, communicates, designs, codes, or prefers answers to be shaped.\n- decision: a choice already made, including why, tradeoffs, rejected alternatives, or reversibility.\n- preference: a stable working preference, constraint, tool choice, privacy preference, format preference, or habit.\n- person: a collaborator, stakeholder, user, client, author, or named human with relevant relationship/role context.\n\nReturn this exact JSON shape:\n{\n  \"nodes\": [\n    {\n      \"type\": \"project\" | \"style\" | \"decision\" | \"preference\" | \"person\",\n      \"title\": \"short human-readable title\",\n      \"summary\": \"one factual sentence, max 220 characters\",\n      \"confidence\": 0.0,\n      \"evidence\": \"short direct evidence phrase from the input, max 180 characters\",\n      \"entities\": [\"important names, tools, projects, people\"],\n      \"tags\": [\"lowercase-keywords\"]\n    }\n  ]\n}`\n```\n\nThe parser is intentionally defensive. Gemma 4 is good at structured output, but production code still needs repair paths.\n\n```\nexport function parseContextExtraction(raw: string): ContextExtractionParseResult {\n  const errors: string[] = []\n  const parsed = parseJsonObject(raw)\n\n  if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {\n    return { ok: false, result: { nodes: [] }, errors: ['Output is not a JSON object.'] }\n  }\n\n  const root = parsed as Record<string, unknown>\n  if (!Array.isArray(root.nodes)) {\n    return { ok: false, result: { nodes: [] }, errors: ['Missing nodes array.'] }\n  }\n\n  const nodes: ExtractedContextNode[] = []\n  for (const [index, value] of root.nodes.entries()) {\n    const node = normalizeNode(value, index, errors)\n    if (node) nodes.push(node)\n  }\n\n  return { ok: errors.length === 0, result: { nodes: nodes.slice(0, 6) }, errors }\n}\n```\n\nThe second hard problem was assembling context for different tools.\n\nClaude, ChatGPT, and Cursor do not want the same payload. Claude benefits from concise prose sections. ChatGPT works well with a compact bullet brief. Cursor needs engineering-focused context.\n\nSo ContextFabric asks Gemma 4 to assemble app-aware context briefs:\n\n``` js\nexport const PAYLOAD_ASSEMBLY_SYSTEM_PROMPT = `You are ContextFabric's local Gemma 4 payload assembler.\n\nGoal:\nTurn user-approved local memory nodes into one coherent context brief for another AI tool.\n\nRules:\n- Use ONLY the supplied memory nodes. Do not invent facts, names, features, dates, metrics, or claims.\n- Prefer stable project, decision, style, preference, and person nodes over raw conversation/code snippets.\n- Write a useful brief, not a JSON dump.\n- Include source node ids inline as [node:id] after concrete claims.\n- If the nodes do not support a requested claim, omit it.\n- Respect the requested app format.\n- Stay under the requested maximum word count.\n\nApp formats:\n- claude: concise prose with sections \"Context\", \"Decisions\", \"Working Style\", \"How to Use This\".\n- chatgpt: short bullet-oriented brief with \"Known Context\", \"Preferences\", \"Relevant Sources\".\n- cursor: engineering-focused brief with \"Project\", \"Architecture / Decisions\", \"Coding Preferences\", \"Files / Sources\".\n- generic: compact neutral brief with clear source ids.\n\nReturn JSON only:\n{\n  \"payload\": \"the final context brief\",\n  \"usedNodeIds\": [\"node-id\"],\n  \"warnings\": [\"optional warning when data is thin or uncertain\"]\n}`\n```\n\nThe third hard problem was making the local model usable on normal hardware.\n\nI hit memory issues while testing Gemma locally, so ContextFabric creates a constrained Ollama profile called `cf-gemma4`\n\n.\n\n``` js\nconst res = await fetch(`${this.baseUrl}/api/create`, {\n  method: 'POST',\n  headers: { 'Content-Type': 'application/json' },\n  body: JSON.stringify({\n    model: this.constrainedModelName,\n    from: sourceModel,\n    parameters: {\n      num_ctx: this.runtimeContext,\n      num_predict: 64,\n      num_batch: 4,\n    },\n    stream: false,\n  }),\n  signal: ctrl.signal,\n})\n```\n\nThis was not about making the model weaker.\n\nIt was about making the demo run on real laptops, not just on a perfect GPU workstation.\n\nFor the local HTTP daemon, I added a small API that judges can test without understanding the whole Electron app:\n\n``` js\ncompat.post('/extract', async (req: Request, res: Response) => {\n  const { text, title = 'HTTP Extract', inputType = 'api', save = false } = req.body\n  if (!text?.trim()) {\n    res.status(400).json({ error: 'text is required' })\n    return\n  }\n\n  const result = await extractNodesFromText(db, ollama, text, title, inputType, Boolean(save))\n  res.json({\n    ok: true,\n    saved: Boolean(save),\n    savedCount: result.savedCount,\n    nodes: result.nodes.map(nodeToPublicJson),\n  })\n})\n\ncompat.get('/context', async (req: Request, res: Response) => {\n  const appId = String(req.query.app || req.query.appId || 'generic')\n  const query = String(req.query.query || 'current project context, writing style, technical decisions, preferences')\n  const nodes = selectTokenNodes(db.getNodes(800), query, 16)\n  const assembly = await assembleTokenPayloadWithTimeout(ollama, { appId, query, nodes, maxWords: 800 })\n  res.json({ ok: true, appFormat: assembly.appFormat, payload: assembly.payload })\n})\n```\n\nThat endpoint is what makes the browser extension bridge simple. The extension does not need to know how the graph works. It asks the local daemon for approved context and inserts it into the active AI chat box.\n\n## Why Gemma 4\n\nGemma 4 is not a decorative dependency here.\n\nIt is the part of the system that turns ContextFabric from a searchable note bucket into a memory protocol.\n\nI chose **Gemma 4 E2B** as the target model profile because ContextFabric is supposed to run where personal context actually lives: on laptops, desktops, and eventually smaller edge devices.\n\nA cloud model would have defeated the core privacy constraint. If your private context graph has to leave the machine for extraction, then the product becomes a privacy policy promise instead of a privacy-preserving architecture.\n\nA much larger local model could produce stronger answers, but it would make the product less usable for the people who need it most. The challenge specifically highlights small Gemma 4 models for edge and local use, and that is exactly the design space ContextFabric lives in.\n\nGemma 4 plays three roles:\n\n**1. Context extraction**\n\nIt reads messy user-owned text and converts it into typed, durable memory nodes.\n\nThis is different from summarization. A summary says \"what was this text about?\" Context extraction asks \"what should another AI assistant remember later?\"\n\n**2. Conflict detection**\n\nIf a new memory contradicts an existing one, Gemma 4 can mark the conflict or uncertainty. That matters because memory should not silently rot.\n\nFor example, if an old preference says \"prefer short answers\" and a new note says \"prefer detailed long answers\", ContextFabric should surface that conflict instead of pretending both are equally true forever.\n\n**3. Payload assembly**\n\nWhen Claude, ChatGPT, or Cursor asks for context, Gemma 4 turns relevant graph nodes into a coherent brief with citations and a word limit.\n\nThis is where the model's reasoning is useful: not to invent project facts, but to decide how to package approved facts for another tool.\n\nThe architecture also keeps Gemma 4 on the correct side of the trust boundary.\n\nThe normal challenge path uses Ollama locally. The daemon binds to loopback. The database is local. The browser extension talks to `localhost`\n\n. There is no ContextFabric cloud memory service receiving your data.\n\nThat is the difference between \"we care about privacy\" and \"the data path cannot reach our server because there is no server in the path.\"\n\n## The Bigger Picture\n\nI do not think the long-term version of this idea is just an app.\n\nI think it is protocol infrastructure.\n\nHTTP made documents portable across servers. SMTP made email portable across providers. ContextFabric is an early sketch of what a personal AI context protocol could look like.\n\nToday, every AI company is building memory as a product feature. That makes sense. Memory improves retention.\n\nBut as developers, we should ask a harder question:\n\nShould personal AI context belong to the tool, or to the user?\n\nMy answer is the user.\n\nThat is why ContextFabric has permission requests, scoped grants, source citations, local storage, and a browser extension bridge. The extension is the adoption wedge: it makes the protocol useful before any AI company agrees to support it natively.\n\nThat was the \"I never thought of it that way\" moment for me.\n\nThe future of AI memory should not be one giant memory per vendor. It should be a user-controlled context layer that tools can request access to.\n\nThe browser extension is the wedge.\n\nThe protocol is the point.\n\n## Challenges I Ran Into\n\nThe hardest challenge was not building a chat UI.\n\nIt was keeping the system honest.\n\nEarly versions returned raw code chunks when I asked project-level questions. That was technically \"retrieval\", but it was bad memory. I had to improve ranking so durable nodes like `project`\n\n, `decision`\n\n, `style`\n\n, and `preference`\n\nwin over random bundled JavaScript or CSS.\n\nThe second challenge was local model reliability. Gemma 4 needs enough free memory, and normal laptops are messy. People have Chrome, VS Code, Docker, Discord, and ten other things open.\n\nThat led to the constrained Ollama profile, shorter prompts, fallback parsing, and clearer error messages.\n\nThe third challenge was browser injection. Claude, ChatGPT, Cursor, and Perplexity do not share one DOM structure. The extension has to find active inputs, avoid stale text areas, handle single-page-app navigation, and never crash the page if the daemon is offline.\n\nThe fourth challenge was packaging. A project that only works on my machine is not a challenge submission. I added a one-command startup path, release assets, Chrome extension packaging, screenshots, and a GitHub Pages landing page.\n\n## What's Next\n\nThe next version is about turning the prototype into a real protocol.\n\nMy roadmap:\n\n- publish the Chrome Web Store listing after review\n- add native macOS and Windows installers\n- improve LAN sync between devices\n- add richer conflict resolution workflows\n- publish a formal context payload schema\n- build SDKs so indie AI tools can request ContextFabric memory directly\n- explore a standard token format for scoped context grants\n\nThe browser extension is useful now, but the bigger win is native integration.\n\nI want AI tools to request context the way apps request OAuth scopes, except the resource is not your Google Drive or GitHub account. It is your personal working context.\n\n## Try It Yourself\n\nRepo:\n\n[https://github.com/Boweii22/ContextFabric](https://github.com/Boweii22/ContextFabric)\n\nLive Site:\n\n[https://boweii22.github.io/ContextFabric/](https://boweii22.github.io/ContextFabric/)\n\nInstall Ollama:\n\nThen run:\n\n```\ngit clone https://github.com/Boweii22/ContextFabric.git\ncd ContextFabric\nnpm run start\n```\n\nOn macOS, use Node 20 or 22:\n\n```\nnvm install 20\nnvm use 20\nnpm run start\n```\n\nOn Windows, use Node 20 via `fnm`\n\nor `nvm-windows`\n\n:\n\n```\nfnm use 20\nnpm run start\n```\n\nOpen the local demo UI:\n\n```\nhttp://127.0.0.1:7749/ui\n```\n\nThe fastest test:\n\n- Paste some project context into\n**Extract Context**. - Click\n**Extract and save**. - Watch Gemma 4 create typed memory nodes.\n- Open the Claude context preview.\n- Try the browser extension bridge.\n\nBuilt by **Bowei Tombri** for the DEV Gemma 4 Challenge.\n\nIf you build with AI tools every day, I am curious: would you rather each tool keep its own memory, or would you prefer a local memory layer that every tool has to request permission from?", "url": "https://wpnews.pro/news/i-built-contextfabric-one-private-memory-layer-across-claude-chatgpt-cursor-and", "canonical_source": "https://dev.to/_boweii/i-built-contextfabric-one-private-memory-layer-across-claude-chatgpt-cursor-and-more-with-local-i1b", "published_at": "2026-05-22 22:26:45+00:00", "updated_at": "2026-05-22 23:03:53.735077+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "open-source", "developer-tools", "products"], "entities": ["ContextFabric", "Claude", "ChatGPT", "Cursor", "Gemma 4", "Ollama", "Electron", "SQLite"], "alternates": {"html": "https://wpnews.pro/news/i-built-contextfabric-one-private-memory-layer-across-claude-chatgpt-cursor-and", "markdown": "https://wpnews.pro/news/i-built-contextfabric-one-private-memory-layer-across-claude-chatgpt-cursor-and.md", "text": "https://wpnews.pro/news/i-built-contextfabric-one-private-memory-layer-across-claude-chatgpt-cursor-and.txt", "jsonld": "https://wpnews.pro/news/i-built-contextfabric-one-private-memory-layer-across-claude-chatgpt-cursor-and.jsonld"}}