{"slug": "your-net-rag-stack-hides-a-python-sidecar-i-built-the-engine-that-removes-it", "title": "Your .NET RAG stack hides a Python sidecar. I built the engine that removes it.", "summary": "A developer ported the DocNest document-chunking engine from Python to idiomatic C#/.NET 8, eliminating the need for a Python sidecar in .NET RAG stacks. The engine preserves document structure (headings, tables) to prevent information loss during ingestion, and produces byte-compatible .udf files that work across both Python and .NET runtimes. The solution uses local ONNX MiniLM embeddings and optional LLMs, and is available as NuGet packages.", "body_md": "TL;DR— Every .NET RAG project quietly ships a Python sidecar to do one job: chunk documents. I got rid of mine.DocNest .NETis an idiomatic C# / .NET 8 port of my[DocNest]engine — embeddings runlocally(ONNX MiniLM, no key, offline), the LLM isoptional(factual questions answered atzero tokens), and the`.udf`\n\nknowledge base it writes isbyte-compatible with the Python version. Ingest in Python, query in C#. It's on NuGet today.·[Repo].[NuGet]\n\nYou're building on .NET. The product needs to answer questions over a pile of PDFs, contracts, spreadsheets — real retrieval-augmented generation. So you go looking for tooling, and you find the same thing I did:\n\n**It's all Python.**\n\nLangChain, LlamaIndex, every RAG tutorial worth reading — Python, Python, Python. So you do the thing nobody admits to in the architecture review: you stand up a little Python service on the side. A second runtime to containerize, deploy, version, monitor, and wake up to at 3 a.m. when it OOMs. All so it can split a document into chunks and hand them back to your *actual* app.\n\nA whole extra language in production to chop up a PDF. I stared at that diagram one too many times and decided it had to go.\n\nSo I ported DocNest to C#. Not a wrapper shelling out to `python.exe`\n\n— a real, idiomatic .NET port. `async`\n\n/`await`\n\nend to end, every dependency behind an interface, shipped as proper NuGet packages. Nothing Python left in the runtime.\n\nBut to explain *why* DocNest is worth porting, I have to tell you about the bug that started the whole thing.\n\nA RAG app I'd built gave a client a confidently wrong number. Not \"I don't know\" — a clean, specific, *wrong* answer, delivered with total confidence. I spent three days assuming my retrieval ranking was off, tuning embeddings and `k`\n\nvalues and similarity thresholds.\n\nThe ranking was fine. The problem happened **before** any of that — at ingestion. Here's how almost every pipeline reads a document:\n\n```\nPDF → extract text → split every 512 chars → embed → store → hope\n```\n\nWatch what that does to a revenue table:\n\n```\nchunk_1: \"45.2%  Q3  Europe  38.1%  Q2  Europe  41.7%  Q3\"\nchunk_2: \"Asia   29.3%  Q2  Asia  Americas  52.1%  Q3  Ame\"\n```\n\nThe headers are gone. The rows are shredded across a chunk boundary. The model receives a bag of loose numbers with no idea which is revenue, which is a quarter, which region they belong to — and fills the gap with a confident guess. That's not a model problem or a retrieval problem. **It's an ingestion problem.** You destroyed the meaning before the model ever saw the data.\n\nA person doesn't read a report as one long character stream. They see headings, sections, a table with columns. DocNest does the same: it reads the document's **structure first**. Every heading becomes a navigable `§section`\n\n. Every table is preserved as structured data — never flattened:\n\n```\n{\n  \"section\": \"§4.2 Revenue by Region\",\n  \"table\": {\n    \"headers\": [\"Region\", \"Q2\", \"Q3\", \"Change\"],\n    \"rows\": [\n      [\"Europe\", \"38.1%\", \"45.2%\", \"+7.1pp\"],\n      [\"Asia\",   \"29.3%\", \"41.7%\", \"+12.4pp\"]\n    ]\n  }\n}\n```\n\nSame numbers, same model, same question — but now the answer is right, and it comes with a citation. The document is normalised **once** into a portable `.udf`\n\nfile: a self-contained ZIP holding the section index, key numbers, keywords, section text, and quantised embeddings. Parse once, query forever.\n\nHere's the part I'm proud of. The `.udf`\n\nformat is an open spec, and the .NET writer produces files that are **byte-compatible with the Python engine**. That one constraint unlocks something genuinely useful:\n\n`.udf`\n\nto your One ingestion ecosystem, two languages, the *same* artifact moving between them. Nothing in the codebase is allowed to break that cross-ecosystem contract — it's the whole point.\n\nWhen I describe this, two questions come back every time. They're actually two independent choices:\n\n**1. Embeddings run locally.** A small ONNX MiniLM model (~90 MB) downloads once and caches. No API key, fully offline. There's an optional ONNX cross-encoder reranker for dense PDFs.\n\n**2. The LLM is optional.** Answer Layers 0–1 resolve factual questions deterministically — **zero tokens, no key**. You only bring an LLM for synthesis, and when you do, \"OpenAI\" means the *answer* model, not embeddings. The two never get coupled.\n\n```\ndotnet add package DocNest.Core\ndotnet add package DocNest.Parsers\ndotnet add package DocNest.Retrieval\ndotnet add package DocNest.Query\nusing DocNest;\nusing DocNest.Parsers;\nusing DocNest.Pipeline;\nusing DocNest.Query;\nusing DocNest.Retrieval;\nusing DocNest.Udf;\n\n// Parse → normalise → write a portable .udf\nvar raw = await new ParserFactory().Get(\"report.pdf\").ParseAsync(\"report.pdf\");\nvar doc = new DocNestPipeline().Process(raw);\nawait new UdfWriter().WriteAsync(doc, \"report.udf\");\n\n// Load it back and ask — deterministic layers, no LLM\nvar document = (await UdfReader.LoadAsync(\"report.udf\")).ToDocument();\n\nusing var retriever = new HybridRetriever(\".docnest_cache\");\nvar engine = new DocNestQueryEngine(retriever);   // no LLM → Layers 0–1 only\nvar result = await engine.AnswerAsync(document, \"What was Q3 revenue?\", allowLlm: false);\n\nConsole.WriteLine(result.Answer);     // \"Q3 revenue: $38M (source: §3.1)\"\nConsole.WriteLine(result.TokensUsed); // 0\n```\n\nPrefer the terminal?\n\n```\ndotnet tool install -g DocNest.Cli\ndocnest convert report.pdf -o report.udf\ndocnest query report.udf \"What was Q3 revenue?\"\n```\n\n`OpenAiCompatibleLlmProvider`\n\ntalks to OpenAI, Groq, Cerebras, Together, OpenRouter and local servers (Ollama, LM Studio) — change the base URL and model. Anthropic has its own provider.\n\n```\nILlmProvider llm = new OpenAiCompatibleLlmProvider(\n    apiKey:  Environment.GetEnvironmentVariable(\"GROQ_API_KEY\")!,\n    model:   \"llama-3.3-70b-versatile\",\n    baseUrl: \"https://api.groq.com/openai/v1\");\n\nvar engine = new DocNestQueryEngine(retriever, llm);\nvar result = await engine.AnswerAsync(document, \"Summarise the key risks.\", allowLlm: true);\nConsole.WriteLine(string.Join(\", \", result.Citations));  // [\"§5.2\", \"§5.3\"]\nfile  → IParser → DocNestPipeline (normalise · key-numbers · keywords) → Document → .udf\nquery → HybridRetriever (BM25 + dense + cross-encoder rerank + RRF + 1-hop graph) → top-k\n      → DocNestQueryEngine (5 layers) → answer + citations + tokens + confidence\n```\n\n| Layer | Mechanism | Tokens |\n|---|---|---|\n| 0 | Pre-computed key-numbers / summary | 0 |\n| 1 | Extractive from the top section | 0 |\n| 2 | Single-section LLM | ~300 |\n| 3 | Multi-section synthesis (reranked context) | ~900 |\n| 4 | Broad fallback over retrieved sections | ~1,500 |\n\nThe engine climbs this ladder only when a cheaper rung isn't confident. Layers 0–1 handle a surprising share of real factual questions at zero cost — you pay tokens only for genuine synthesis.\n\nA multi-format eval — 10 documents, 88 questions, 5 formats (the same set as the Python reference), dense + cross-encoder rerank, `gpt-oss-120b`\n\nnarrator, `qwen2.5`\n\njudge:\n\n| Format | Score | Hit-rate (≥7) |\n|---|---|---|\n| XLSX | 8.7 / 10 | 93% |\n| MD | 8.7 / 10 | 100% |\n| DOCX | 7.0 / 10 | 79% |\n| HTML | 4.8 / 10 | 50% |\n| 6.8 / 10 | 70% | |\nOverall |\n~7.1 / 10 |\n~78% |\n\nThe Python reference sits at **8.5/10**. This .NET port is at **7.1** and closing the gap slice by slice — the cross-encoder reranker alone dragged PDFs from **5.1 → 6.8** (hit-rate 47% → 70%). HTML is clearly my weakest format right now, and it's the next thing I'm fixing.\n\nI could have cherry-picked a kinder run and quoted a bigger number. I'd rather ship the reproducible one with the eval harness sitting right next to it in the repo. If you don't trust a benchmark you can't re-run, neither do I.\n\n| Package | Role |\n|---|---|\n`DocNest.Abstractions` |\nDomain records + wrapper interfaces |\n`DocNest.Core` |\nPipeline, normaliser, `.udf` reader/writer, quantizer |\n`DocNest.Parsers` |\nmd / html / csv / docx / xlsx / pdf |\n`DocNest.Embeddings` |\nONNX MiniLM embedder + ms-marco cross-encoder reranker |\n`DocNest.Retrieval` |\nHybrid retriever (FTS5 BM25 + dense + rerank + RRF + graph) |\n`DocNest.Query` |\n5-layer answer engine + LLM providers |\n`DocNest.Storage` |\n`.udf` ZIP storage backend |\n`DocNest.Cli` |\n`docnest` dotnet tool |\n\nParsers cover PDF (PdfPig), DOCX/XLSX (OpenXML), HTML (AngleSharp), CSV/TSV and Markdown. Every external dependency lives behind a DocNest interface, so swapping any of them is a one-line change.\n\nThis is **pre-1.0**, built slice-by-slice under a gated protocol: understand → plan → design + ADR → tests-first → full suite green → sign-off, per phase. The core pipeline, hybrid retrieval, cross-encoder reranking and the 5-layer engine are implemented and tested. Cloud embedding providers (OpenAI embeddings and friends) exist in the Python engine but aren't ported yet — embeddings here are local-only by design.\n\n```\ndotnet add package DocNest.Core\n# or\ndotnet tool install -g DocNest.Cli\n```\n\n`pip install docnest-ai`\n\n)`.udf`\n\nspec:If you've ever stood up a Python sidecar just to chunk a PDF for a .NET app, I'd genuinely like to know whether this kills that step for you — tell me in the comments. And if it does, a star on the repo helps other .NET folks find it.\n\n*Secure · Fast · Reliable · Cost-Effective*", "url": "https://wpnews.pro/news/your-net-rag-stack-hides-a-python-sidecar-i-built-the-engine-that-removes-it", "canonical_source": "https://dev.to/gunjantailor/your-net-rag-stack-hides-a-python-sidecar-i-built-the-engine-that-removes-it-5190", "published_at": "2026-06-16 05:09:08+00:00", "updated_at": "2026-06-16 05:17:09.047807+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "natural-language-processing", "ai-products"], "entities": ["DocNest", ".NET", "Python", "ONNX MiniLM", "NuGet", "LangChain", "LlamaIndex"], "alternates": {"html": "https://wpnews.pro/news/your-net-rag-stack-hides-a-python-sidecar-i-built-the-engine-that-removes-it", "markdown": "https://wpnews.pro/news/your-net-rag-stack-hides-a-python-sidecar-i-built-the-engine-that-removes-it.md", "text": "https://wpnews.pro/news/your-net-rag-stack-hides-a-python-sidecar-i-built-the-engine-that-removes-it.txt", "jsonld": "https://wpnews.pro/news/your-net-rag-stack-hides-a-python-sidecar-i-built-the-engine-that-removes-it.jsonld"}}