{"slug": "out-of-stealth-kinda", "title": "Out of Stealth (Kinda)", "summary": "Egoist Machines, Inc. launched LodeDB, an open-source embedded vector database for local retrieval-augmented generation (RAG) that runs the same on-disk index on GPU when available, achieving up to 50,000 queries per second on an L40S. The in-process, on-disk database requires no server, API key, or telemetry of raw data, and supports incremental persistence and compact 2/4-bit vector storage.", "body_md": "**A fast, exact embedded vector database for local RAG: in-process, on-disk, no server.**\n\n*Built by Egoist Machines, Inc. - efficient full-stack infrastructure\nfor reliable AI systems.*\n\nMost embedded vector databases stop at the CPU. LodeDB runs the same on-disk index on the\nGPU when you have one: batched search hits **24k queries/sec on an A10 and 50k qps on an L40S**,\n2.8× to 4.8× the all-CPU ceiling, with recall unchanged. It also persists changed rows\nincrementally, so a commit stays **sub-millisecond even at 1M vectors**.\n\nFast on a laptop. Faster on a GPU. Exact every time. Never phones home.\n\n**GPU-resident batch search**: an fp16 copy of the index lives on the GPU, scored with a tiled GEMM plus a streaming top-k (`[gpu]`\n\n, Linux/CUDA).[How it works](#gpu-resident-index).**O(changed) persistence**: commits only the rows that changed, 173× to 1,308× faster than a full rewrite.[How it works](#delta-persistence).** Compact storage**: the MIT[TurboVec](#turbovec)core packs vectors into 2/4-bit codes and scans them with SIMD CPU kernels.** In-process, on-disk**(`.tvim`\n\n/`.tvd`\n\n/`.jsd`\n\n): no daemon, no account, no API key.**Private by default**: text, ids, and vectors stay local; telemetry is metrics-only (counts, bytes, latency), never raw payloads.** Local embeddings**:`sentence-transformers`\n\non CUDA, MPS, or CPU.**Batteries included**: a`lodedb`\n\nCLI, a loopback dev server, an MCP server, and a LangChain`VectorStore`\n\nadapter.\n\n🏢\n\nEnterpriseThe LodeDB core is Apache-2.0 and free to use. Enterprise licensing is available for commercial support, managed and at-scale serving, and on-prem / BYOC deployment. Contact[sales@egoistmachines.com].\n\n```\npip install lodedb\n```\n\nThat's it. Prebuilt wheels cover Linux, macOS (Apple Silicon and Intel), and Windows on\nPython 3.11+, and bundle the TurboVec (Rust) core, so there's nothing to compile. Confirm\nthe install with `lodedb doctor`\n\n. Optional extras:\n\n```\npip install \"lodedb[gpu]\"            # GPU-resident scan (Linux/CUDA)\npip install \"lodedb[mcp,langchain]\"  # MCP server + LangChain adapter\n```\n\n**Build from source** (contributors, or a platform without a wheel)\n\nNeeds a Rust toolchain and a CBLAS provider (Accelerate on macOS, `libopenblas-dev`\n\non\nLinux). [uv](https://docs.astral.sh/uv/) builds and bundles the core for you:\n\n```\ngit clone https://github.com/Egoist-Machines/LodeDB && cd LodeDB\nuv sync                                 # builds + bundles the TurboVec core via maturin\nuv sync --extra mcp --extra langchain   # + MCP server, LangChain adapter\nuv sync --extra gpu                     # + GPU-resident scan (Linux/CUDA)\n```\n\nRun with `uv run`\n\n(e.g. `uv run lodedb doctor`\n\n).\n\n``` python\nfrom lodedb import LodeDB\n\ndb = LodeDB(path=\"./data\", model=\"minilm\")   # \"minilm\" (fast) | \"bge\" (quality)\n\nfox = db.add(\"the quick brown fox jumps\", metadata={\"topic\": \"animals\"})\ndb.add(\"a lazy dog sleeps all day\", metadata={\"topic\": \"animals\"})\n\nfor score, doc_id, meta in db.search(\"fox\", k=5):\n    print(score, doc_id, meta)\n\nfor hits in db.search_many([\"fox\", \"dog\"], k=5):   # batched; the GPU can serve this\n    print([(h.score, h.id, h.metadata) for h in hits])\n\ndb.get(fox)     # -> \"the quick brown fox jumps\"  (text retained by default)\ndb.persist()    # durable .tvim/.tvd/.jsd snapshot; replays on reopen\n```\n\nReopen with `LodeDB(path=\"./data\")`\n\n; no migration step. Original text is kept in a\n`.tvtext`\n\nsidecar for `db.get`\n\n; pass `store_text=False`\n\nto keep none. Presets are `minilm`\n\n(384-dim) and `bge`\n\n(768-dim), with weights pulled from Hugging Face on first use. More in\n[ examples/](/Egoist-Machines/LodeDB/blob/main/examples).\n\nWith the `[gpu]`\n\nextra on a CUDA host, LodeDB reconstructs the compact index into an fp16\nmatrix resident on the GPU and scores batched `search_many`\n\nwith a tiled GEMM plus a\nstreaming top-k. It is opt-in and lazy: single queries, non-CUDA hosts, and GPU-memory\nrejection fall back to the CPU scan, which stays the source of truth.\n\nGPU throughput climbs with batch size while the CPU scan is flat. Same 4-bit index (d=1536, 100K), same host, only the scoring step differs. Crossover is around batch 50:\n\n| query batch | A10 GPU | L40S GPU |\n|---|---|---|\n| 1 | 261 q/s | 432 q/s |\n| 16 | 3,531 | 5,562 |\n| 64 | 11,463 | 18,175 |\n| 256 | 19,998 | 39,449 |\n| 1024 | 24,037 |\n50,326 |\n\nVanilla TurboVec CPU (all threads) on the same boxes: 8,497 q/s (A10 host), 10,420 q/s (L40S host). At batch 1024 the GPU is 2.8× / 4.8× that, and it scales with GPU class.\n\nRecall is unchanged: the GPU scores the exact 4-bit reconstruction, so R@1 tracks the CPU scan across datasets and bit-widths, and edges ahead on GloVe-200 where quantization error is largest.\n\nOther in-process vector databases stay CPU-bound. Alibaba's\n[zvec](https://github.com/alibaba/zvec) reports about 8.4k q/s (VectorDBBench, 16-vCPU CPU,\nCohere 768-dim): the same class as the TurboVec CPU scan, and a different regime from ours,\nso read it as the CPU-class baseline. The GPU-resident path is what clears it.\n\n**Scope.** GPU search is Linux/CUDA-only and opt-in (`[gpu]`\n\n). macOS scans on the CPU (the\nMPS scan is experimental). See [docs/benchmarks.md](/Egoist-Machines/LodeDB/blob/main/docs/benchmarks.md) and\n[docs/architecture.md](/Egoist-Machines/LodeDB/blob/main/docs/architecture.md).\n\nMost embedded indexes rewrite the whole file on every change (O(N)). LodeDB writes only the rows that changed (O(changed)), so a 1,000-row commit stays sub-millisecond at any size:\n\n| corpus | full rewrite | delta export | speedup |\n|---|---|---|---|\n| 100K | 42.4 ms | 0.25 ms | 173× |\n| 500K | 190.4 ms | 0.24 ms | 782× |\n| 1M | 404.9 ms | 0.31 ms | 1,308× |\n\nThe GPU path makes reads fast; the delta makes writes cheap. The on-disk format stays a plain snapshot that replays on reopen.\n\nAll artifacts are metrics-only (counts, bytes, latency), never payloads. Full methodology\nand the complete figure set are in [docs/benchmarks.md](/Egoist-Machines/LodeDB/blob/main/docs/benchmarks.md); each\n[benchmarks/](/Egoist-Machines/LodeDB/blob/main/benchmarks) folder has a README and a one-line reproduction command.\n\nLocal is the common case. On an Apple M1 (MiniLM, 20K docs) the CPU scan is ~0.25 ms p50, and end-to-end single-query latency is 5.7 ms p50.\n\n```\nlodedb doctor      # capability report: embedding / GPU / TurboVec backend\nlodedb index ...   # build / add to an on-disk index\nlodedb query ...   # search\nlodedb serve       # loopback dev server (127.0.0.1, no auth)\nlodedb mcp         # stdio MCP server for agent memory\nlodedb benchmark   # local, metrics-only benchmark\n```\n\n**Exact scan, no ANN.** Built for small-to-mid corpora where exact recall matters, not billion-scale.**GPU is Linux/CUDA-only and opt-in**(`[gpu]`\n\n). macOS scans on the CPU; the MPS scan is experimental and was slower than NEON on the hardware tested.**Single queries run on the CPU**; the GPU serves batched`search_many`\n\n.**Model weights download from Hugging Face** on first use, then cache locally.\n\nThe compact core is the upstream **MIT** [TurboVec](https://github.com/RyanCodrai/turbovec)\nproject (© Ryan Codrai), vendored under [ third_party/turbovec/](/Egoist-Machines/LodeDB/blob/main/third_party/turbovec)\nwith its license preserved. LodeDB's lifecycle patches (encoded-row export/import,\n\n`upsert_with_ids`\n\n, calibration) are Apache-2.0. See [.](/Egoist-Machines/LodeDB/blob/main/NOTICE)\n\n`NOTICE`\n\nApache-2.0 ([ LICENSE](/Egoist-Machines/LodeDB/blob/main/LICENSE)). The bundled TurboVec core is MIT (\n\n[,](/Egoist-Machines/LodeDB/blob/main/NOTICE)\n\n`NOTICE`\n\n[). \"LodeDB\" and \"](/Egoist-Machines/LodeDB/blob/main/third_party/turbovec/LICENSE)\n\n`third_party/turbovec/LICENSE`\n\n[Egoist Machines](https://egoistmachines.com)\" are trademarks; Apache-2.0 grants no trademark rights (§6).\n\nEnterprise licensing and commercial support are available from\n[Egoist Machines, Inc.](https://egoistmachines.com): contact\n[sales@egoistmachines.com](mailto:sales@egoistmachines.com).\n\nPRs welcome; see [ CONTRIBUTING.md](/Egoist-Machines/LodeDB/blob/main/CONTRIBUTING.md). Report security issues\n\n**privately** per\n\n[, not in public issues. Other bugs and requests go to the](/Egoist-Machines/LodeDB/blob/main/SECURITY.md)\n\n`SECURITY.md`\n\n[issue tracker](https://github.com/Egoist-Machines/LodeDB/issues).", "url": "https://wpnews.pro/news/out-of-stealth-kinda", "canonical_source": "https://github.com/Egoist-Machines/LodeDB", "published_at": "2026-06-20 00:41:37+00:00", "updated_at": "2026-06-20 01:07:53.431917+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-tools", "ai-infrastructure", "ai-products", "ai-startups"], "entities": ["Egoist Machines, Inc.", "LodeDB", "TurboVec", "LangChain", "CUDA", "A10", "L40S", "Hugging Face"], "alternates": {"html": "https://wpnews.pro/news/out-of-stealth-kinda", "markdown": "https://wpnews.pro/news/out-of-stealth-kinda.md", "text": "https://wpnews.pro/news/out-of-stealth-kinda.txt", "jsonld": "https://wpnews.pro/news/out-of-stealth-kinda.jsonld"}}