{"slug": "how-i-found-out-52-of-my-knowledge-graph-was-duplicates-and-what-i-did-about-it", "title": "How I Found Out 52% of My Knowledge Graph Was Duplicates (and What I Did About It)", "summary": "A developer building ANIMUS, an autonomous Rust system for persistent LLM memory, discovered that 52% of its knowledge graph nodes were duplicates. An audit revealed an overly aggressive filter trapped the system in a loop re-exploring the same topics, inflating node count without adding new knowledge. The fix involved correcting a search function and migrating the inference engine to a local quantized Gemma 4 E2B model.", "body_md": "I've spent the last several months building [ANIMUS](https://github.com/ernestoariasdiaz/animus-ai), an autonomous system in Rust that gives a local LLM persistent memory. The idea is simple: a knowledge graph that grows on its own, cycle after cycle, as the system reads documents, detects gaps in its knowledge, and fills them in.\n\nFor months, the metric I watched most closely was the node count of the graph. It kept climbing. I felt good about that.\n\nUntil I ran a full audit and found out that **52% of those nodes were undetected duplicates**. Of 1,892 reported nodes, only 911 were actually unique.\n\nANIMUS's autonomous loop actively looks for \"gaps\" — holes in its knowledge that the system decides to fill on its own. The problem: an overly aggressive filter was excluding certain categories from the gap pool, which trapped the system in a loop of re-exploring the same ~40 topics for thousands of cycles. Each pass generated content that was *similar* but not identical to the last — different enough to avoid triggering any exact-duplicate check, but substantially the same information rephrased.\n\nThe node count kept climbing. Actual knowledge, not so much.\n\nThe fix wasn't magic, it was audit work:\n\n`Brain::search`\n\n): it walked the graph from node 0 with `.take(2)`\n\n, which meant it almost always returned stale content from earlier versions of the system. A simple `.rev()`\n\nfixed it.Along the way, I also migrated the inference engine: from a Python wrapper to a `llama-server.exe`\n\nlaunched directly from Rust, and from the original model to a quantized Gemma 4 E2B, running at ~77 tokens/second on a consumer GPU (RTX 3050, 4GB). None of this required the cloud or paid APIs — everything runs locally.\n\nThe most valuable part of this whole episode wasn't fixing the bug. It was realizing that **a metric that only goes up never warns you that something is wrong**. Node count was a proxy for \"the system is learning,\" but optimizing that one proxy, with nothing to balance it, ended up producing the opposite: inflated content, not new knowledge.\n\nANIMUS now runs on several cross-checked signals (verified uniqueness, recency-weighted relevance, source validation) instead of one vanity metric. If two signals start to diverge, the system stops and re-audits instead of continuing to generate.\n\nIf you're curious about the full picture (architecture, benchmarks, comparison against a simple vector RAG baseline), the technical paper is open access with a DOI: [10.5281/zenodo.20674981](https://doi.org/10.5281/zenodo.20674981). Code is on [GitHub](https://github.com/ernestoariasdiaz/animus-ai).\n\n*ANIMUS is an independent project, developed in Santo Domingo, Dominican Republic.*", "url": "https://wpnews.pro/news/how-i-found-out-52-of-my-knowledge-graph-was-duplicates-and-what-i-did-about-it", "canonical_source": "https://dev.to/ernesto_arias_148b35bc25d/-how-i-found-out-52-of-my-knowledge-graph-was-duplicates-and-what-i-did-about-it-3coh", "published_at": "2026-06-25 00:50:17+00:00", "updated_at": "2026-06-25 01:13:01.600268+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "ai-agents", "developer-tools"], "entities": ["ANIMUS", "Rust", "Gemma 4 E2B", "RTX 3050", "Santo Domingo"], "alternates": {"html": "https://wpnews.pro/news/how-i-found-out-52-of-my-knowledge-graph-was-duplicates-and-what-i-did-about-it", "markdown": "https://wpnews.pro/news/how-i-found-out-52-of-my-knowledge-graph-was-duplicates-and-what-i-did-about-it.md", "text": "https://wpnews.pro/news/how-i-found-out-52-of-my-knowledge-graph-was-duplicates-and-what-i-did-about-it.txt", "jsonld": "https://wpnews.pro/news/how-i-found-out-52-of-my-knowledge-graph-was-duplicates-and-what-i-did-about-it.jsonld"}}