{"slug": "how-lume-works-the-retrieval-primitives", "title": "How Lume Works: The Retrieval Primitives", "summary": "Steve Harris and the author have built Lume, an open-source Rust hybrid search engine designed for agentic systems, which combines field-aware BM25, dense vectors, and an entity graph to provide auditable, inspectable retrieval. The engine indexes Markdown, source code, and PDFs, and runs lexical search and entity graph locally while allowing dense vector calls to a local endpoint. Lume's design prioritizes local-first operation, layered scoring signals, and full auditability of every ranking decision.", "body_md": "[Lume](https://github.com/DeepBlueDynamics/lume) is a Rust hybrid search engine that [Steve Harris](https://github.com/jsclosures) and I have been building in the open at `github.com/DeepBlueDynamics/lume`\n\n. It’s a small CLI plus an MCP server, BSD-3 licensed, and built around a stubborn idea: when an agent asks a question, every step from query to evidence should be inspectable.\n\nLume indexes Markdown, source code, and PDFs (via a small Python extractor) and ranks over them with three independent primitives — field-aware BM25, dense GTR-T5 vectors via [Shivvr](https://shivvr.nuts.services), and a significance-scored entity graph. The lexical core and the graph run entirely on your machine; only the dense vectors call out, and that endpoint defaults to `localhost`\n\n. There is no opaque “search box that returns a ranking” — every score has a name, a file, and a knob.\n\nThis post walks Lume’s retrieval core end to end, with line-level references to the current tree. If you’re building agentic systems and tired of treating retrieval as a magic step, this is for you.\n\nA few principles up front, because they explain the design:\n\n**Local-first.** Lexical search and the entity graph run entirely on your machine. Dense vectors are fetched from Shivvr through`SHIVVR_BASE_URL`\n\n, which defaults to a local endpoint.**Layered, not monolithic.** BM25, semantic, and graph are independent signals with their own scores. The blend is one line; each input is replaceable.**Auditable.** The engine prints what it pruned, what it ranked, and why it rejected the rest.\n\n## 0. The unit of retrieval: a Section\n\nLume indexes Markdown, cut into **sections** at `#`\n\nheaders (`parse_markdown`\n\nin `src/bm25.rs:211`\n\n). A `Section`\n\n(`src/bm25.rs:106`\n\n) is the atom everything ranks over:\n\n```\npub struct Section {\n    pub title: String,\n    pub body: String,\n    pub line_number: usize,\n    pub filename: Option<String>,\n    pub entities: Vec<String>,   // resolved named entities, for the graph\n}\n```\n\nTitle and body are **separate fields** with separate statistics — that distinction shows up immediately in scoring. The whole index lives in memory as a `Bm25Index`\n\n(`src/bm25.rs:147`\n\n): per-field term-frequency maps, document frequencies, field lengths, **roaring-bitmap posting lists**, prime/Gödel signature filters, and the entity posting lists that feed the graph.\n\n## 1. Primitive: field-aware BM25\n\nThe lexical core is a field-aware BM25 with three selectable variants. The tuning defaults (`Bm25Params`\n\nin `src/bm25.rs:125`\n\n) are deliberately classic:\n\n```\nSelf { k1: 1.2, b: 0.75, delta: 1.0, title_weight: 2.0, body_weight: 1.0 }\n```\n\n`k1`\n\ncontrols term-frequency saturation; `b`\n\ncontrols length normalization. The one opinionated choice is ** title_weight: 2.0**: a title hit contributes twice as much as a body hit before the coordination factor is applied. That is useful, but it can overweight chapter titles when a query token is broad. Treat it as a knob, not a law.\n\nIDF is the standard smoothed form, floored at zero, and each term’s contribution is computed per field then summed with the field weights (`calculate_bm25_term_score`\n\nin `src/bm25.rs:728`\n\n):\n\n``` js\nlet len_normalization = 1.0 - b + b * (doc_len / avgdl);\nmatch variant {\n    SearchVariant::Classic => idf * (tf * (k1 + 1.0)) / (tf + k1 * len_normalization),\n    SearchVariant::Plus    => idf * ((tf*(k1+1.0))/(tf + k1*len_normalization) + params.delta),\n    SearchVariant::L       => { let s = tf / len_normalization;\n                                idf * (s*(k1+1.0))/(s + k1) },\n}\n// total_score += title_weight * title_score + body_weight * body_score;  (src/bm25.rs:635)\n```\n\n**Classic** is textbook BM25.**Plus** adds a`delta`\n\nfloor so a matched term never contributes*nothing*, countering BM25’s over-penalty of long documents.**L** moves length normalization inside the saturation, smoothing very long docs.\n\nLume runs `Classic`\n\nby default (`src/main.rs:1430`\n\n).\n\n## 2. Two-stage pruning: roaring union, then Gödel signatures\n\nYou don’t want to BM25-score all 1,926 sections of a book for every query. Lume’s `search`\n\n(`src/bm25.rs:445`\n\n) is **two-stage**.\n\n**Stage 1 — candidate gather.** Union the roaring-bitmap posting lists of the query terms. This is a handful of bitset ORs and instantly narrows the corpus to sections that contain *any* query term:\n\n``` js\n// src/bm25.rs:460\nlet mut candidate_set = MiniRoaring::new();\nlet mut first = true;\nfor q_tok in &query_tokens {\n    if let Some(list) = self.posting_lists.get(&q_tok.bytes) {\n        if first {\n            candidate_set = list.clone();\n            first = false;\n        } else {\n            candidate_set = candidate_set.union(list);\n        }\n    }\n}\n```\n\n**Stage 1b — Gödel tag-signature pruning.** If the query tagger recognizes entities, each candidate section is verified against a **prime-factored signature filter** (`PrimeFilter::test_tag_prime`\n\nin `src/fast_retrieval.rs:449`\n\n, evaluated in `src/bm25.rs:538`\n\n). Each known tag output maps to a prime; a section’s tag signature is the product of its tag primes, so inclusion is checked by divisibility. Unknown query tags deliberately receive a dummy prime and fail closed. Candidates that fail are dropped as `TagSignatureMismatch`\n\nbefore heavier scoring.\n\n**Stage 2 — heavy scoring** runs only on survivors. And the engine *tells you* the shape of the funnel on stderr (`src/bm25.rs:557`\n\n):\n\n```\n[Two-Stage Pruning] Pruned candidate space from 1926 to 302 (roaring generated: 609) sections in 54.70µs\nCandidates: 609\nRanked: 302\nRejected:\n  TagSignatureMismatch: 307\n  ...\n```\n\nThat accounting is not decoration — it’s the first thing you read when a query returns the wrong thing.\n\n## 3. Query hygiene: stopwords and coordination\n\nTwo small primitives have outsized effects on quality.\n\n**Query-side stopword filtering** (`filter_query_stopwords`\n\nin `src/bm25.rs:98`\n\n). Function and question words are stripped from the **query only**, never the index. Without it, “how does Dantès know Mercédès” is dominated by *how/does/know*, which match unrelated sections (a chapter literally titled *“How a Gardener…”*). The safety net: if *every* token is a stopword (“how are you”), the originals are kept so you still get results.\n\n**The coordination factor** (`src/bm25.rs:638`\n\n). A document matching more of the *distinct* query terms should beat one that repeats a single common term. Lume multiplies the score by a coverage-based factor:\n\n``` js\nlet coverage = matched_terms.len() as f64 / num_distinct as f64;\nlet coord = COORD_FLOOR + (1.0 - COORD_FLOOR) * coverage;   // COORD_FLOOR = 0.5\ntotal_score *= coord;\n```\n\nSo a section matching all three query terms keeps 100% of its score; one matching a single term out of three keeps ~⅔. For single-term queries `coverage == 1.0`\n\n, so ordinary lookups are untouched. It’s a gentle nudge, not a hard AND.\n\n## 4. Primitive: dense vectors (local GTR-T5)\n\nLexical search can’t bridge a vocabulary gap — “starved to death” vs “gastroenteritis.” That’s the semantic primitive’s job.\n\nLume embeds text to **768-dimensional GTR-T5** vectors via **Shivvr**. The default base URL is `http://localhost:8085`\n\n(`src/hybrid.rs:777`\n\n), and the request still requires a service token (`src/hybrid.rs:784`\n\n). There’s no dedicated embed endpoint, so `embed_text`\n\n(`src/hybrid.rs:43`\n\n) ingests into a throwaway scratch store and reads the vector straight off the response:\n\n```\n// 768-d GTR-T5 (\"organize\") vector, asserted on the way out:\nif emb.len() != 768 { return Err(format!(\"Expected 768-d GTR-T5 vector, got {}\", emb.len())); }\n```\n\nAt index time, sections are pushed into a **semantic session** (`ensure_semantic_session`\n\nin `src/hybrid.rs:581`\n\n). The clever part is incrementality: each section gets a content hash (`section_hash`\n\nin `src/hybrid.rs:407`\n\n) used as the remote chunk id, with line numbers deliberately excluded — so **moving** a section doesn’t force a re-embed. Re-indexing diffs by hash and tops up only what changed; a matching corpus fingerprint is a no-op. Query results are cached too (`.lume-semantic-cache.json`\n\n).\n\nAt query time, `query_semantic_search`\n\n(`src/hybrid.rs:637`\n\n) asks Shivvr for `n=60`\n\nneighbors. If the index was built without semantic vectors, or the token is missing, search degrades cleanly to lexical BM25 and *says so*: an `alpha > 0`\n\nrequest against a lexical-only index is told it got lexical-only (`src/main.rs:1389`\n\n).\n\n## 5. Primitive: the Semantic Knowledge Graph (significance, not co-counts)\n\nThe third signal is structural. Lume builds an **entity co-occurrence graph** from pairwise roaring-bitset intersections — “counting the counts of things” (`src/graph_search.rs:1`\n\n). On its own that’s a write-only export; `graph_search`\n\nturns it into a **query-time ranking signal**, fully local.\n\nThe subtle, important bit is **how edges are weighted**. Raw co-occurrence and even Jaccard reward *promiscuous hubs* — an entity that appears everywhere co-occurs with everything. Lume defaults to a **significance** score instead (`cooccurrence_relatedness`\n\nin `src/semantic_mesh.rs:558`\n\n): of `n`\n\ndocs, A appears in `a`\n\n, B in `b`\n\n, both in `k`\n\n; compare observed `k`\n\nagainst the independence expectation `E = a·b/n`\n\nas a z-score, **log-compress** it to preserve dynamic range, then squash with `tanh`\n\n:\n\n``` js\nlet expected  = a * b / n;\nlet variance  = expected * (1.0 - a/n) * (1.0 - b/n);\nlet z         = (k - expected) / variance.sqrt();\nlet compressed = z.signum() * (1.0 + z.abs()).ln();   // keep z=10 vs z=100 distinct\n(compressed / 3.0).tanh()                              // -> [-1, 1]; negative = avoidance\n```\n\nThe result lives in `[-1, 1]`\n\n: positive for real association, **negative for avoidance** (entities that co-occur *below* chance). This is Trey Grainger’s foreground-vs-background SKG relatedness, reduced to the pairwise case. A regression test pins the behavior: a true pairing (edmond–dantès) outranks a promiscuous hub even when the hub has perfect Jaccard overlap (`src/graph_search.rs:305`\n\n). You can still pick legacy Jaccard with `--scoring jaccard`\n\n; the edge `weight()`\n\n(`src/semantic_mesh.rs:532`\n\n) selects, and clamps negative significance to `0`\n\nso avoidance never *boosts*.\n\nThe walk itself (`compute_skg_scores`\n\nin `src/graph_search.rs:154`\n\n):\n\n**Resolve** the query’s entities by sliding n-grams longest-first (so “edmond dantes” matches the stored “edmond dantès”),`max_ngram = 4`\n\n.**Seed** them at weight`1.0`\n\n; walk**one hop** to the top-`k`\n\n(8) strongest neighbors, each carrying`decay * weight`\n\n(`decay = 0.5`\n\n), taking the**max** across seeds — so a shared hub neighbor can’t be summed into dominance.**Accumulate** per-section mass from the entity posting lists and**normalize to**.`[0, 1]`\n\n## 6. The blend: one multiplicative line\n\nThree signals — lexical, semantic, graph — fuse in `blend_hybrid_scores`\n\n(`src/hybrid.rs:673`\n\n). The default is **multiplicative**, so the lexical match leads and the other two *lift* it:\n\n```\nhybrid = bm25 * (1.0 + alpha * semantic + beta * skg);\n```\n\nThis keeps strong lexical hits on top while letting semantic and graph signals lift them. Recall expansion comes from two paths: semantic-only hits are admitted with `bm25_score = 0`\n\n, and SKG-only sections are admitted only when their normalized graph mass reaches `SKG_EXPAND_MIN = 0.5`\n\n(`src/graph_search.rs:22`\n\n). In the lexical-only path those SKG-only sections are scored `beta*skg`\n\n, below comparable lexical matches (`apply_skg_boost`\n\nin `src/graph_search.rs:218`\n\n). Set `beta = 0`\n\nand the graph term is removed.\n\nThere’s an alternate mode for when you want semantic/graph to be able to *overtake* lexical, gated behind `LUME_BLEND_NORM=1`\n\n, which puts all three on a comparable `[0,1]`\n\nscale (`src/hybrid.rs:745`\n\n):\n\n``` js\n// src/hybrid.rs:745\nlet hybrid_score = if normalize {\n    (bm25_score / bm25_max) + alpha * sem_score + beta * skg_score\n} else if bm25_score > 0.0 {\n    bm25_score * (1.0 + alpha * sem_score + beta * skg_score)\n} else {\n    sem_score + beta * skg_score // fallback for semantic-only recall expansion\n};\n```\n\n## 7. The tuning surface: what each knob actually does\n\nEverything above is exposed on `lume search`\n\n(`handle_search`\n\nin `src/main.rs:1295`\n\n). Here’s the practical guide:\n\n| Knob | Flag / env | Default | What it changes |\n|---|---|---|---|\n| Lexical ↔ semantic | `-a, --alpha` / `ALPHA` |\n`0.5` in `lume search` |\n`0.0` = pure BM25; higher values increase the GTR-T5 term. Raise it when the answer uses different words than the query. |\n| Graph boost | `-g, --graph` / `GRAPH_ALPHA` |\n`0.4` |\nWeight of the SKG term. Raise to pull in entity-related sections; `0` disables the graph (pure lexical+semantic). |\n| Edge scoring | `--scoring` |\n`relatedness` |\n`relatedness` (significance, hub-resistant) vs `jaccard` (raw overlap). Use significance unless you’re reproducing old numbers. |\n| Spell correction | `-c, --spell-check` |\noff | Corrects each query word against the index vocabulary (`correct_query` in `src/main.rs:1273` ) before searching. |\n| BM25 length / saturation | `Bm25Params` |\n`k1=1.2, b=0.75` |\nLower `b` for corpora with wildly varying section lengths; raise `k1` to reward repeated terms more. |\n| Field weight | `title_weight` |\n`2.0` |\nHow much a title hit beats a body hit. Lower it if chapter/section titles are hijacking results. |\n| Query inversion (debug) | `LUME_QUERY_INVERSION=1` |\noff | Round-trips the query’s GTR-T5 vector back to text (`invert_vector` in `src/inversion.rs:30` ) so you can inspect the embedding. Costs an extra embed plus invert request and does not affect ranking. |\n\nA worked example of the inversion trick: if semantic search keeps missing, set `LUME_QUERY_INVERSION=1`\n\nand read what your query embeds *back* to. If “how does the prisoner escape” inverts to something about gardens, your embedding isn’t anchored on “escape” — reword, or lean lexical with a lower `alpha`\n\n.\n\n## 8. Case study: the retrieval bug that confused the agent\n\nTo understand how these primitives interact, consider a real query run during testing over the full text of *The Count of Monte Cristo*: **“How does Edmond Dantès’s father die?”**\n\nOn Lume’s initial run, the answering agent returned: *“The provided passages do not contain information regarding how Edmond Dantès’s father died.”*\n\nHowever, the death is fully detailed in the corpus. We traced this failure to three compounding issues at the query and retrieval parameter boundary:\n\n**Proper-noun bias.** The query planner generated keyword queries like`\"Dantès father death\"`\n\n. In Chapter 26, the father is referred to as “the old man” or “the elder Dantès”, and his death is described without the word “death” (*“died of starvation”*). Meanwhile, the literal chapter title*“Father and Son”*and broad entity matches on`Dantès`\n\npulled retrieval toward passages about the father while alive, pushing the target scene out of the evidence window.**Hardcoded retrieval depth.** The number of passages fed to the LLM (`n_feed`\n\n) was hardcoded to 10. When the target passage was pushed down in rank by proper-noun bias, it was clipped before the model could evaluate it.**Context pressure.** Local model calls ran without an explicit`num_ctx`\n\n, so prompts containing multiple passages could be truncated by the model runtime.\n\n### The fix\n\nWe corrected the behavior by adjusting the boundaries of our primitives:\n\n**Query diversification.** We updated the planner (`plan_queries`\n\nin`src/answer.rs:49`\n\n) to generate multiple query angles: one targeting explicit entities, one using event synonyms (e.g.`\"died of starvation grief\"`\n\n), and one using secondary characters or details.**Dynamic feedback depth.** We scaled`n_feed`\n\ndynamically based on requested candidates:`candidates.clamp(10, 20)`\n\nin`src/main.rs:1797`\n\n.**Explicit context bounds.** We set`\"num_ctx\": 16384`\n\nexplicitly in`ollama_chat`\n\n(`src/answer.rs:21`\n\n) and widened snippet sizes to 180 words.\n\nWith these changes, the planner’s query `\"died of starvation grief\"`\n\nsurfaced the death scene in the retrieval set. The agent could then judge the evidence sufficient and generate a cited answer.\n\n## How the pieces fit\n\nFor a single query, Lume executes:\n\n**Tokenize & filter.**`tokenize`\n\n(`src/bm25.rs:452`\n\n) and strip stopwords from the query string using`filter_query_stopwords`\n\n(`src/bm25.rs:98`\n\n).**Stage 1 (prune).** Perform a roaring-bitmap union of term posting lists, then check Gödel tag signatures using`test_tag_prime`\n\n(`src/fast_retrieval.rs:449`\n\n).**Stage 2 (lexical).** Compute field-aware BM25 scores via`calculate_bm25_term_score`\n\n(`src/bm25.rs:728`\n\n) and apply coordination factor (`src/bm25.rs:644`\n\n) over survivors.**Semantic primitive.** Fetch GTR-T5 local dense neighbor scores using`query_semantic_search`\n\n(`src/hybrid.rs:637`\n\n).**Graph primitive.** Walk the significance-weighted graph using`compute_skg_scores`\n\n(`src/graph_search.rs:154`\n\n).**Blend.** Merge signals with`blend_hybrid_scores`\n\n(`src/hybrid.rs:673`\n\n) and return the candidates.\n\nThe result is a ranking pipeline where each component can be inspected, turned down, or disabled without rewriting the rest of the system.\n\nLume is built by [Steve Harris](https://github.com/jsclosures) and [Kord Campbell](https://github.com/kordless). Source: [github.com/DeepBlueDynamics/lume](https://github.com/DeepBlueDynamics/lume). Line numbers above are against the current tree and drift as code moves — grep the symbol names (`Bm25Index::search`\n\n, `blend_hybrid_scores`\n\n, `compute_skg_scores`\n\n, `cooccurrence_relatedness`\n\n, `handle_search`\n\n) if they’ve shifted.\n\n// transmission ends", "url": "https://wpnews.pro/news/how-lume-works-the-retrieval-primitives", "canonical_source": "https://deepbluedynamics.com/blog/lume-retrieval-primitives", "published_at": "2026-06-20 17:44:50+00:00", "updated_at": "2026-06-20 18:07:18.686363+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-research", "developer-tools", "machine-learning"], "entities": ["Lume", "Steve Harris", "Rust", "Shivvr", "BM25", "GTR-T5", "DeepBlueDynamics", "MCP"], "alternates": {"html": "https://wpnews.pro/news/how-lume-works-the-retrieval-primitives", "markdown": "https://wpnews.pro/news/how-lume-works-the-retrieval-primitives.md", "text": "https://wpnews.pro/news/how-lume-works-the-retrieval-primitives.txt", "jsonld": "https://wpnews.pro/news/how-lume-works-the-retrieval-primitives.jsonld"}}