Your .NET RAG stack hides a Python sidecar. I built the engine that removes it.

A developer ported the DocNest document-chunking engine from Python to idiomatic C#/.NET 8, eliminating the need for a Python sidecar in .NET RAG stacks. The engine preserves document structure (headings, tables) to prevent information loss during ingestion, and produces byte-compatible .udf files that work across both Python and .NET runtimes. The solution uses local ONNX MiniLM embeddings and optional LLMs, and is available as NuGet packages.

TL;DR— Every .NET RAG project quietly ships a Python sidecar to do one job: chunk documents. I got rid of mine.DocNest .NETis an idiomatic C / .NET 8 port of my DocNest engine — embeddings runlocally ONNX MiniLM, no key, offline , the LLM isoptional factual questions answered atzero tokens , and the .udf knowledge base it writes isbyte-compatible with the Python version. Ingest in Python, query in C . It's on NuGet today.· Repo . NuGet You're building on .NET. The product needs to answer questions over a pile of PDFs, contracts, spreadsheets — real retrieval-augmented generation. So you go looking for tooling, and you find the same thing I did: It's all Python. LangChain, LlamaIndex, every RAG tutorial worth reading — Python, Python, Python. So you do the thing nobody admits to in the architecture review: you stand up a little Python service on the side. A second runtime to containerize, deploy, version, monitor, and wake up to at 3 a.m. when it OOMs. All so it can split a document into chunks and hand them back to your actual app. A whole extra language in production to chop up a PDF. I stared at that diagram one too many times and decided it had to go. So I ported DocNest to C . Not a wrapper shelling out to python.exe — a real, idiomatic .NET port. async / await end to end, every dependency behind an interface, shipped as proper NuGet packages. Nothing Python left in the runtime. But to explain why DocNest is worth porting, I have to tell you about the bug that started the whole thing. A RAG app I'd built gave a client a confidently wrong number. Not "I don't know" — a clean, specific, wrong answer, delivered with total confidence. I spent three days assuming my retrieval ranking was off, tuning embeddings and k values and similarity thresholds. The ranking was fine. The problem happened before any of that — at ingestion. Here's how almost every pipeline reads a document: PDF → extract text → split every 512 chars → embed → store → hope Watch what that does to a revenue table: chunk 1: "45.2% Q3 Europe 38.1% Q2 Europe 41.7% Q3" chunk 2: "Asia 29.3% Q2 Asia Americas 52.1% Q3 Ame" The headers are gone. The rows are shredded across a chunk boundary. The model receives a bag of loose numbers with no idea which is revenue, which is a quarter, which region they belong to — and fills the gap with a confident guess. That's not a model problem or a retrieval problem. It's an ingestion problem. You destroyed the meaning before the model ever saw the data. A person doesn't read a report as one long character stream. They see headings, sections, a table with columns. DocNest does the same: it reads the document's structure first . Every heading becomes a navigable §section . Every table is preserved as structured data — never flattened: { "section": "§4.2 Revenue by Region", "table": { "headers": "Region", "Q2", "Q3", "Change" , "rows": "Europe", "38.1%", "45.2%", "+7.1pp" , "Asia", "29.3%", "41.7%", "+12.4pp" } } Same numbers, same model, same question — but now the answer is right, and it comes with a citation. The document is normalised once into a portable .udf file: a self-contained ZIP holding the section index, key numbers, keywords, section text, and quantised embeddings. Parse once, query forever. Here's the part I'm proud of. The .udf format is an open spec, and the .NET writer produces files that are byte-compatible with the Python engine . That one constraint unlocks something genuinely useful: .udf to your One ingestion ecosystem, two languages, the same artifact moving between them. Nothing in the codebase is allowed to break that cross-ecosystem contract — it's the whole point. When I describe this, two questions come back every time. They're actually two independent choices: 1. Embeddings run locally. A small ONNX MiniLM model ~90 MB downloads once and caches. No API key, fully offline. There's an optional ONNX cross-encoder reranker for dense PDFs. 2. The LLM is optional. Answer Layers 0–1 resolve factual questions deterministically — zero tokens, no key . You only bring an LLM for synthesis, and when you do, "OpenAI" means the answer model, not embeddings. The two never get coupled. dotnet add package DocNest.Core dotnet add package DocNest.Parsers dotnet add package DocNest.Retrieval dotnet add package DocNest.Query using DocNest; using DocNest.Parsers; using DocNest.Pipeline; using DocNest.Query; using DocNest.Retrieval; using DocNest.Udf; // Parse → normalise → write a portable .udf var raw = await new ParserFactory .Get "report.pdf" .ParseAsync "report.pdf" ; var doc = new DocNestPipeline .Process raw ; await new UdfWriter .WriteAsync doc, "report.udf" ; // Load it back and ask — deterministic layers, no LLM var document = await UdfReader.LoadAsync "report.udf" .ToDocument ; using var retriever = new HybridRetriever ".docnest cache" ; var engine = new DocNestQueryEngine retriever ; // no LLM → Layers 0–1 only var result = await engine.AnswerAsync document, "What was Q3 revenue?", allowLlm: false ; Console.WriteLine result.Answer ; // "Q3 revenue: $38M source: §3.1 " Console.WriteLine result.TokensUsed ; // 0 Prefer the terminal? dotnet tool install -g DocNest.Cli docnest convert report.pdf -o report.udf docnest query report.udf "What was Q3 revenue?" OpenAiCompatibleLlmProvider talks to OpenAI, Groq, Cerebras, Together, OpenRouter and local servers Ollama, LM Studio — change the base URL and model. Anthropic has its own provider. ILlmProvider llm = new OpenAiCompatibleLlmProvider apiKey: Environment.GetEnvironmentVariable "GROQ API KEY" , model: "llama-3.3-70b-versatile", baseUrl: "https://api.groq.com/openai/v1" ; var engine = new DocNestQueryEngine retriever, llm ; var result = await engine.AnswerAsync document, "Summarise the key risks.", allowLlm: true ; Console.WriteLine string.Join ", ", result.Citations ; // "§5.2", "§5.3" file → IParser → DocNestPipeline normalise · key-numbers · keywords → Document → .udf query → HybridRetriever BM25 + dense + cross-encoder rerank + RRF + 1-hop graph → top-k → DocNestQueryEngine 5 layers → answer + citations + tokens + confidence | Layer | Mechanism | Tokens | |---|---|---| | 0 | Pre-computed key-numbers / summary | 0 | | 1 | Extractive from the top section | 0 | | 2 | Single-section LLM | ~300 | | 3 | Multi-section synthesis reranked context | ~900 | | 4 | Broad fallback over retrieved sections | ~1,500 | The engine climbs this ladder only when a cheaper rung isn't confident. Layers 0–1 handle a surprising share of real factual questions at zero cost — you pay tokens only for genuine synthesis. A multi-format eval — 10 documents, 88 questions, 5 formats the same set as the Python reference , dense + cross-encoder rerank, gpt-oss-120b narrator, qwen2.5 judge: | Format | Score | Hit-rate ≥7 | |---|---|---| | XLSX | 8.7 / 10 | 93% | | MD | 8.7 / 10 | 100% | | DOCX | 7.0 / 10 | 79% | | HTML | 4.8 / 10 | 50% | | 6.8 / 10 | 70% | | Overall | ~7.1 / 10 | ~78% | The Python reference sits at 8.5/10 . This .NET port is at 7.1 and closing the gap slice by slice — the cross-encoder reranker alone dragged PDFs from 5.1 → 6.8 hit-rate 47% → 70% . HTML is clearly my weakest format right now, and it's the next thing I'm fixing. I could have cherry-picked a kinder run and quoted a bigger number. I'd rather ship the reproducible one with the eval harness sitting right next to it in the repo. If you don't trust a benchmark you can't re-run, neither do I. | Package | Role | |---|---| DocNest.Abstractions | Domain records + wrapper interfaces | DocNest.Core | Pipeline, normaliser, .udf reader/writer, quantizer | DocNest.Parsers | md / html / csv / docx / xlsx / pdf | DocNest.Embeddings | ONNX MiniLM embedder + ms-marco cross-encoder reranker | DocNest.Retrieval | Hybrid retriever FTS5 BM25 + dense + rerank + RRF + graph | DocNest.Query | 5-layer answer engine + LLM providers | DocNest.Storage | .udf ZIP storage backend | DocNest.Cli | docnest dotnet tool | Parsers cover PDF PdfPig , DOCX/XLSX OpenXML , HTML AngleSharp , CSV/TSV and Markdown. Every external dependency lives behind a DocNest interface, so swapping any of them is a one-line change. This is pre-1.0 , built slice-by-slice under a gated protocol: understand → plan → design + ADR → tests-first → full suite green → sign-off, per phase. The core pipeline, hybrid retrieval, cross-encoder reranking and the 5-layer engine are implemented and tested. Cloud embedding providers OpenAI embeddings and friends exist in the Python engine but aren't ported yet — embeddings here are local-only by design. dotnet add package DocNest.Core or dotnet tool install -g DocNest.Cli pip install docnest-ai .udf spec:If you've ever stood up a Python sidecar just to chunk a PDF for a .NET app, I'd genuinely like to know whether this kills that step for you — tell me in the comments. And if it does, a star on the repo helps other .NET folks find it. Secure · Fast · Reliable · Cost-Effective