{"slug": "how-i-tracked-down-a-36gb-memory-leak-in-a-claude-code-memory-server", "title": "How I tracked down a 36GB memory leak in a Claude Code memory server", "summary": "A developer tracked down a 36GB memory leak in a Claude Code memory server caused by sql.js WebAssembly filesystem (MEMFS) not being cleaned up. The leak occurred because garbage-collecting the JavaScript Database wrapper did not free the MEMFS file created inside the WASM module, leaving 11MB database images per open operation. The fix involved adding an RSS watchdog to the proxy to restart the child process when memory crosses a threshold.", "body_md": "A debugging story about heap snapshots, native memory that `--max-old-space-size`\n\ncan't touch, and a WebAssembly filesystem quietly hoarding files.\n\nI run a small service that gives a team of Claude Code users one shared memory store. Mechanically it's a Node/Express proxy that wraps a stdio MCP server (`ruflo`\n\n) and exposes it over HTTP. You don't need the product to follow the bug — just one fact: a long-lived Node process serves memory operations, and underneath it uses **sql.js** (SQLite compiled to WebAssembly) to hold the store.\n\nOne instance in production kept growing. Not spiking — *creeping*. ~36 GB RSS over six weeks, then the cgroup OOM-killer would reap it and the clock reset. Classic leak shape.\n\nThe proxy and the wrapped MCP child are separate processes. `ps`\n\nsettled it fast: the proxy sat flat at ~60 MB; the `ruflo mcp start`\n\nchild was the one ballooning. So the leak was below my code, in the wrapped process. Good — narrower problem.\n\nFirst instinct on a Node leak is the V8 heap. So I looked at `process.memoryUsage()`\n\non the live child:\n\n```\nrss            1385 MB\nheapTotal        24 MB\nheapUsed         21 MB\nexternal       1286 MB\narrayBuffers    995 MB\n```\n\nThis is the whole story in five numbers. `heapTotal`\n\n— the V8 JS heap — is flat at 24 MB. The growth is entirely in ** external / arrayBuffers**: native memory backing\n\n`ArrayBuffer`\n\ns, That immediately kills two \"obvious\" fixes:\n\n`--max-old-space-size`\n\nSo: what holds ~1 GB of `ArrayBuffer`\n\ns?\n\nI opened the inspector on the live process (`kill -USR1 <pid>`\n\n, then connected over the WebSocket — Node 22 has a global `WebSocket`\n\n, so a 30-line script does it) and took a `HeapProfiler.takeHeapSnapshot`\n\n. The snapshot was only ~18 MB, which is itself a clue: if the leak were *hundreds of thousands of small* JS objects, the graph would be huge. A small graph holding a lot of bytes means **a few big buffers**.\n\nParsing the snapshot (the format is just `nodes`\n\n/ `edges`\n\n/ `strings`\n\narrays), the top retained objects were unambiguous:\n\n```\n203 × native:system / JSArrayBufferData @ 11.0 MB = 2233 MB\n```\n\n203 buffers, **11 MB each**. And 11 MB was exactly the size of the on-disk `memory.db`\n\n. The retainer chain:\n\n```\nJSArrayBufferData (11 MB)\n  <- ArrayBuffer\n  <- Buffer\n  <- (MEMFS file node).contents\n  <- FS.nodes  (an Array)\n  <- Context  (the sql.js Emscripten module — has WebAssembly.Memory, HEAPF32, createNode, /dev/tty…)\n  <- SqlJsBackend.db\n```\n\nThat `Context`\n\nwith `createNode`\n\n, `/dev/tty`\n\n, and a `WebAssembly.Memory`\n\nis the tell: it's **Emscripten's in-memory filesystem (MEMFS)**. The file names confirmed it — each buffer was a MEMFS file called `dbfile_<random>`\n\n, and there were ~200 of them, each a full copy of the database.\n\nHere's the mechanism. sql.js's `Database`\n\nconstructor writes its input bytes into a MEMFS file (`dbfile_<random>`\n\n) via `FS.createDataFile`\n\n. `Database.prototype.close()`\n\nis what removes it (`FS.unlink`\n\n). And the sql.js module is a **process-wide singleton** — one MEMFS shared by every `Database`\n\nyou ever open.\n\nThe backend opened the database like this, per operation path, with no caching:\n\n```\nthis.db = new SQL.Database(fs.readFileSync(path)); // loads the whole 11MB image\n// ...used, then the wrapper goes out of scope\n```\n\nWhen that JS `Database`\n\nwrapper is dropped, V8 garbage-collects the *wrapper object* — but **GC has no idea about the MEMFS file** it created inside the WASM module. Only an explicit `close()`\n\nunlinks it. No `close()`\n\n→ the 11 MB `dbfile_<random>`\n\nlives in MEMFS forever. One leaked DB image per open. Multiply by traffic and you get 36 GB.\n\nThis is the trap in one sentence: **garbage-collecting a JS handle does not free native/WASM memory it allocated.** The GC sees a tiny wrapper; the cost is in a buffer the GC doesn't manage.\n\n**Containment (ship today).** I added an RSS watchdog to the proxy: it reads the child's RSS from `/proc/<pid>/status`\n\n, and when it crosses a threshold it gracefully respawns the child once it's idle (reusing an existing single-flight reconnect path — kill the old child, spawn a fresh one). A respawn drops the entire bloated MEMFS at once. Symptomatic, but it bounds memory with zero dropped requests.\n\n**Root cause (fix it properly).** Cache the backend per database path so the DB opens **once** and is reused, instead of a fresh `SQL.Database`\n\nper call. No repeated loads → no new `dbfile_*`\n\n. I bake this as a build-time patch into the image and filed it upstream with the snapshot.\n\nThe earlier hard OOM-kills had interrupted a sql.js write mid-flight and left one `memory.db`\n\ncorrupted — `database disk image is malformed`\n\n, busted overflow pages in the B-tree. Recovery turned into its own adventure:\n\n`.recover`\n\n(SQLite's salvage mode) reconstructed the bulk of the rows by walking the B-tree fragments.`-wal`\n\n), which `.recover`\n\ndoesn't replay, and some sat on the corrupted pages. I ended up parsing WAL frames by hand (apply page images by page number) and carving SQLite leaf-page records directly to recover the rest.Lesson burned in: **a WAL-mode SQLite backup is three files** — `db`\n\n+ `-wal`\n\n+ `-shm`\n\n. Copy only the `.db`\n\nand you get exactly that \"malformed\" error, because the latest committed state is still in the WAL.\n\n`heapTotal`\n\nflat + `external`\n\n/`arrayBuffers`\n\nrising = native leak. Don't reach for `--max-old-space-size`\n\n; it can't help.`JSArrayBufferData`\n\nnodes and their retainer chain pointed straight at the owning structure. A small snapshot holding big bytes = few large buffers.Upstream writeup with the full retainer trace: [ ruvnet/ruflo#2432](https://github.com/ruvnet/ruflo/issues/2432). The wrapper itself, if you're curious:\n\n`jazz-max/ruflo-hub`", "url": "https://wpnews.pro/news/how-i-tracked-down-a-36gb-memory-leak-in-a-claude-code-memory-server", "canonical_source": "https://dev.to/jazzmax/how-i-tracked-down-a-36gb-memory-leak-in-a-claude-code-memory-server-27bo", "published_at": "2026-06-22 05:45:29+00:00", "updated_at": "2026-06-22 06:09:46.127380+00:00", "lang": "en", "topics": ["developer-tools", "large-language-models", "ai-agents"], "entities": ["Claude Code", "sql.js", "Emscripten", "MEMFS", "Node.js", "WebAssembly", "V8", "ruflo"], "alternates": {"html": "https://wpnews.pro/news/how-i-tracked-down-a-36gb-memory-leak-in-a-claude-code-memory-server", "markdown": "https://wpnews.pro/news/how-i-tracked-down-a-36gb-memory-leak-in-a-claude-code-memory-server.md", "text": "https://wpnews.pro/news/how-i-tracked-down-a-36gb-memory-leak-in-a-claude-code-memory-server.txt", "jsonld": "https://wpnews.pro/news/how-i-tracked-down-a-36gb-memory-leak-in-a-claude-code-memory-server.jsonld"}}