cd /news/large-language-models/your-net-rag-stack-hides-a-python-si… Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-29008] src=dev.to β†— pub= topic=large-language-models verified=true sentiment=↑ positive

Your .NET RAG stack hides a Python sidecar. I built the engine that removes it.

A developer ported the DocNest document-chunking engine from Python to idiomatic C#/.NET 8, eliminating the need for a Python sidecar in .NET RAG stacks. The engine preserves document structure (headings, tables) to prevent information loss during ingestion, and produces byte-compatible .udf files that work across both Python and .NET runtimes. The solution uses local ONNX MiniLM embeddings and optional LLMs, and is available as NuGet packages.

read7 min views5 publishedJun 16, 2026

TL;DRβ€” Every .NET RAG project quietly ships a Python sidecar to do one job: chunk documents. I got rid of mine.DocNest .NETis an idiomatic C# / .NET 8 port of my[DocNest]engine β€” embeddings runlocally(ONNX MiniLM, no key, offline), the LLM isoptional(factual questions answered atzero tokens), and the.udf

knowledge base it writes isbyte-compatible with the Python version. Ingest in Python, query in C#. It's on NuGet today.Β·[Repo].[NuGet]

You're building on .NET. The product needs to answer questions over a pile of PDFs, contracts, spreadsheets β€” real retrieval-augmented generation. So you go looking for tooling, and you find the same thing I did:

It's all Python.

LangChain, LlamaIndex, every RAG tutorial worth reading β€” Python, Python, Python. So you do the thing nobody admits to in the architecture review: you stand up a little Python service on the side. A second runtime to containerize, deploy, version, monitor, and wake up to at 3 a.m. when it OOMs. All so it can split a document into chunks and hand them back to your actual app.

A whole extra language in production to chop up a PDF. I stared at that diagram one too many times and decided it had to go.

So I ported DocNest to C#. Not a wrapper shelling out to python.exe

β€” a real, idiomatic .NET port. async

/await

end to end, every dependency behind an interface, shipped as proper NuGet packages. Nothing Python left in the runtime.

But to explain why DocNest is worth porting, I have to tell you about the bug that started the whole thing.

A RAG app I'd built gave a client a confidently wrong number. Not "I don't know" β€” a clean, specific, wrong answer, delivered with total confidence. I spent three days assuming my retrieval ranking was off, tuning embeddings and k

values and similarity thresholds.

The ranking was fine. The problem happened before any of that β€” at ingestion. Here's how almost every pipeline reads a document:

PDF β†’ extract text β†’ split every 512 chars β†’ embed β†’ store β†’ hope

Watch what that does to a revenue table:

chunk_1: "45.2%  Q3  Europe  38.1%  Q2  Europe  41.7%  Q3"
chunk_2: "Asia   29.3%  Q2  Asia  Americas  52.1%  Q3  Ame"

The headers are gone. The rows are shredded across a chunk boundary. The model receives a bag of loose numbers with no idea which is revenue, which is a quarter, which region they belong to β€” and fills the gap with a confident guess. That's not a model problem or a retrieval problem. It's an ingestion problem. You destroyed the meaning before the model ever saw the data.

A person doesn't read a report as one long character stream. They see headings, sections, a table with columns. DocNest does the same: it reads the document's structure first. Every heading becomes a navigable Β§section

. Every table is preserved as structured data β€” never flattened:

{
  "section": "Β§4.2 Revenue by Region",
  "table": {
    "headers": ["Region", "Q2", "Q3", "Change"],
    "rows": [
      ["Europe", "38.1%", "45.2%", "+7.1pp"],
      ["Asia",   "29.3%", "41.7%", "+12.4pp"]
    ]
  }
}

Same numbers, same model, same question β€” but now the answer is right, and it comes with a citation. The document is normalised once into a portable .udf

file: a self-contained ZIP holding the section index, key numbers, keywords, section text, and quantised embeddings. Parse once, query forever.

Here's the part I'm proud of. The .udf

format is an open spec, and the .NET writer produces files that are byte-compatible with the Python engine. That one constraint unlocks something genuinely useful:

.udf

to your One ingestion ecosystem, two languages, the same artifact moving between them. Nothing in the codebase is allowed to break that cross-ecosystem contract β€” it's the whole point.

When I describe this, two questions come back every time. They're actually two independent choices:

1. Embeddings run locally. A small ONNX MiniLM model (~90 MB) downloads once and caches. No API key, fully offline. There's an optional ONNX cross-encoder reranker for dense PDFs.

2. The LLM is optional. Answer Layers 0–1 resolve factual questions deterministically β€” zero tokens, no key. You only bring an LLM for synthesis, and when you do, "OpenAI" means the answer model, not embeddings. The two never get coupled.

dotnet add package DocNest.Core
dotnet add package DocNest.Parsers
dotnet add package DocNest.Retrieval
dotnet add package DocNest.Query
using DocNest;
using DocNest.Parsers;
using DocNest.Pipeline;
using DocNest.Query;
using DocNest.Retrieval;
using DocNest.Udf;

// Parse β†’ normalise β†’ write a portable .udf
var raw = await new ParserFactory().Get("report.pdf").ParseAsync("report.pdf");
var doc = new DocNestPipeline().Process(raw);
await new UdfWriter().WriteAsync(doc, "report.udf");

// Load it back and ask β€” deterministic layers, no LLM
var document = (await UdfReader.LoadAsync("report.udf")).ToDocument();

using var retriever = new HybridRetriever(".docnest_cache");
var engine = new DocNestQueryEngine(retriever);   // no LLM β†’ Layers 0–1 only
var result = await engine.AnswerAsync(document, "What was Q3 revenue?", allowLlm: false);

Console.WriteLine(result.Answer);     // "Q3 revenue: $38M (source: Β§3.1)"
Console.WriteLine(result.TokensUsed); // 0

Prefer the terminal?

dotnet tool install -g DocNest.Cli
docnest convert report.pdf -o report.udf
docnest query report.udf "What was Q3 revenue?"

OpenAiCompatibleLlmProvider

talks to OpenAI, Groq, Cerebras, Together, OpenRouter and local servers (Ollama, LM Studio) β€” change the base URL and model. Anthropic has its own provider.

ILlmProvider llm = new OpenAiCompatibleLlmProvider(
    apiKey:  Environment.GetEnvironmentVariable("GROQ_API_KEY")!,
    model:   "llama-3.3-70b-versatile",
    baseUrl: "https://api.groq.com/openai/v1");

var engine = new DocNestQueryEngine(retriever, llm);
var result = await engine.AnswerAsync(document, "Summarise the key risks.", allowLlm: true);
Console.WriteLine(string.Join(", ", result.Citations));  // ["Β§5.2", "Β§5.3"]
file  β†’ IParser β†’ DocNestPipeline (normalise Β· key-numbers Β· keywords) β†’ Document β†’ .udf
query β†’ HybridRetriever (BM25 + dense + cross-encoder rerank + RRF + 1-hop graph) β†’ top-k
      β†’ DocNestQueryEngine (5 layers) β†’ answer + citations + tokens + confidence
Layer Mechanism Tokens
0 Pre-computed key-numbers / summary 0
1 Extractive from the top section 0
2 Single-section LLM ~300
3 Multi-section synthesis (reranked context) ~900
4 Broad fallback over retrieved sections ~1,500

The engine climbs this ladder only when a cheaper rung isn't confident. Layers 0–1 handle a surprising share of real factual questions at zero cost β€” you pay tokens only for genuine synthesis.

A multi-format eval β€” 10 documents, 88 questions, 5 formats (the same set as the Python reference), dense + cross-encoder rerank, gpt-oss-120b

narrator, qwen2.5

judge:

Format Score Hit-rate (β‰₯7)
XLSX 8.7 / 10 93%
MD 8.7 / 10 100%
DOCX 7.0 / 10 79%
HTML 4.8 / 10 50%
6.8 / 10 70%
Overall
~7.1 / 10
~78%

The Python reference sits at 8.5/10. This .NET port is at 7.1 and closing the gap slice by slice β€” the cross-encoder reranker alone dragged PDFs from 5.1 β†’ 6.8 (hit-rate 47% β†’ 70%). HTML is clearly my weakest format right now, and it's the next thing I'm fixing.

I could have cherry-picked a kinder run and quoted a bigger number. I'd rather ship the reproducible one with the eval harness sitting right next to it in the repo. If you don't trust a benchmark you can't re-run, neither do I.

Package Role
DocNest.Abstractions
Domain records + wrapper interfaces
DocNest.Core
Pipeline, normaliser, .udf reader/writer, quantizer
DocNest.Parsers
md / html / csv / docx / xlsx / pdf
DocNest.Embeddings
ONNX MiniLM embedder + ms-marco cross-encoder reranker
DocNest.Retrieval
Hybrid retriever (FTS5 BM25 + dense + rerank + RRF + graph)
DocNest.Query
5-layer answer engine + LLM providers
DocNest.Storage
.udf ZIP storage backend
DocNest.Cli
docnest dotnet tool

Parsers cover PDF (PdfPig), DOCX/XLSX (OpenXML), HTML (AngleSharp), CSV/TSV and Markdown. Every external dependency lives behind a DocNest interface, so swapping any of them is a one-line change.

This is pre-1.0, built slice-by-slice under a gated protocol: understand β†’ plan β†’ design + ADR β†’ tests-first β†’ full suite green β†’ sign-off, per phase. The core pipeline, hybrid retrieval, cross-encoder reranking and the 5-layer engine are implemented and tested. Cloud embedding providers (OpenAI embeddings and friends) exist in the Python engine but aren't ported yet β€” embeddings here are local-only by design.

dotnet add package DocNest.Core
dotnet tool install -g DocNest.Cli

pip install docnest-ai

).udf

spec:If you've ever stood up a Python sidecar just to chunk a PDF for a .NET app, I'd genuinely like to know whether this kills that step for you β€” tell me in the comments. And if it does, a star on the repo helps other .NET folks find it.

Secure Β· Fast Β· Reliable Β· Cost-Effective

── more in #large-language-models 4 stories Β· sorted by recency
── more on @docnest 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/your-net-rag-stack-h…] indexed:0 read:7min 2026-06-16 Β· β€”