# Your .NET RAG stack hides a Python sidecar. I built the engine that removes it.

> Source: <https://dev.to/gunjantailor/your-net-rag-stack-hides-a-python-sidecar-i-built-the-engine-that-removes-it-5190>
> Published: 2026-06-16 05:09:08+00:00

TL;DR— Every .NET RAG project quietly ships a Python sidecar to do one job: chunk documents. I got rid of mine.DocNest .NETis an idiomatic C# / .NET 8 port of my[DocNest]engine — embeddings runlocally(ONNX MiniLM, no key, offline), the LLM isoptional(factual questions answered atzero tokens), and the`.udf`

knowledge base it writes isbyte-compatible with the Python version. Ingest in Python, query in C#. It's on NuGet today.·[Repo].[NuGet]

You're building on .NET. The product needs to answer questions over a pile of PDFs, contracts, spreadsheets — real retrieval-augmented generation. So you go looking for tooling, and you find the same thing I did:

**It's all Python.**

LangChain, LlamaIndex, every RAG tutorial worth reading — Python, Python, Python. So you do the thing nobody admits to in the architecture review: you stand up a little Python service on the side. A second runtime to containerize, deploy, version, monitor, and wake up to at 3 a.m. when it OOMs. All so it can split a document into chunks and hand them back to your *actual* app.

A whole extra language in production to chop up a PDF. I stared at that diagram one too many times and decided it had to go.

So I ported DocNest to C#. Not a wrapper shelling out to `python.exe`

— a real, idiomatic .NET port. `async`

/`await`

end to end, every dependency behind an interface, shipped as proper NuGet packages. Nothing Python left in the runtime.

But to explain *why* DocNest is worth porting, I have to tell you about the bug that started the whole thing.

A RAG app I'd built gave a client a confidently wrong number. Not "I don't know" — a clean, specific, *wrong* answer, delivered with total confidence. I spent three days assuming my retrieval ranking was off, tuning embeddings and `k`

values and similarity thresholds.

The ranking was fine. The problem happened **before** any of that — at ingestion. Here's how almost every pipeline reads a document:

```
PDF → extract text → split every 512 chars → embed → store → hope
```

Watch what that does to a revenue table:

```
chunk_1: "45.2%  Q3  Europe  38.1%  Q2  Europe  41.7%  Q3"
chunk_2: "Asia   29.3%  Q2  Asia  Americas  52.1%  Q3  Ame"
```

The headers are gone. The rows are shredded across a chunk boundary. The model receives a bag of loose numbers with no idea which is revenue, which is a quarter, which region they belong to — and fills the gap with a confident guess. That's not a model problem or a retrieval problem. **It's an ingestion problem.** You destroyed the meaning before the model ever saw the data.

A person doesn't read a report as one long character stream. They see headings, sections, a table with columns. DocNest does the same: it reads the document's **structure first**. Every heading becomes a navigable `§section`

. Every table is preserved as structured data — never flattened:

```
{
  "section": "§4.2 Revenue by Region",
  "table": {
    "headers": ["Region", "Q2", "Q3", "Change"],
    "rows": [
      ["Europe", "38.1%", "45.2%", "+7.1pp"],
      ["Asia",   "29.3%", "41.7%", "+12.4pp"]
    ]
  }
}
```

Same numbers, same model, same question — but now the answer is right, and it comes with a citation. The document is normalised **once** into a portable `.udf`

file: a self-contained ZIP holding the section index, key numbers, keywords, section text, and quantised embeddings. Parse once, query forever.

Here's the part I'm proud of. The `.udf`

format is an open spec, and the .NET writer produces files that are **byte-compatible with the Python engine**. That one constraint unlocks something genuinely useful:

`.udf`

to your One ingestion ecosystem, two languages, the *same* artifact moving between them. Nothing in the codebase is allowed to break that cross-ecosystem contract — it's the whole point.

When I describe this, two questions come back every time. They're actually two independent choices:

**1. Embeddings run locally.** A small ONNX MiniLM model (~90 MB) downloads once and caches. No API key, fully offline. There's an optional ONNX cross-encoder reranker for dense PDFs.

**2. The LLM is optional.** Answer Layers 0–1 resolve factual questions deterministically — **zero tokens, no key**. You only bring an LLM for synthesis, and when you do, "OpenAI" means the *answer* model, not embeddings. The two never get coupled.

```
dotnet add package DocNest.Core
dotnet add package DocNest.Parsers
dotnet add package DocNest.Retrieval
dotnet add package DocNest.Query
using DocNest;
using DocNest.Parsers;
using DocNest.Pipeline;
using DocNest.Query;
using DocNest.Retrieval;
using DocNest.Udf;

// Parse → normalise → write a portable .udf
var raw = await new ParserFactory().Get("report.pdf").ParseAsync("report.pdf");
var doc = new DocNestPipeline().Process(raw);
await new UdfWriter().WriteAsync(doc, "report.udf");

// Load it back and ask — deterministic layers, no LLM
var document = (await UdfReader.LoadAsync("report.udf")).ToDocument();

using var retriever = new HybridRetriever(".docnest_cache");
var engine = new DocNestQueryEngine(retriever);   // no LLM → Layers 0–1 only
var result = await engine.AnswerAsync(document, "What was Q3 revenue?", allowLlm: false);

Console.WriteLine(result.Answer);     // "Q3 revenue: $38M (source: §3.1)"
Console.WriteLine(result.TokensUsed); // 0
```

Prefer the terminal?

```
dotnet tool install -g DocNest.Cli
docnest convert report.pdf -o report.udf
docnest query report.udf "What was Q3 revenue?"
```

`OpenAiCompatibleLlmProvider`

talks to OpenAI, Groq, Cerebras, Together, OpenRouter and local servers (Ollama, LM Studio) — change the base URL and model. Anthropic has its own provider.

```
ILlmProvider llm = new OpenAiCompatibleLlmProvider(
    apiKey:  Environment.GetEnvironmentVariable("GROQ_API_KEY")!,
    model:   "llama-3.3-70b-versatile",
    baseUrl: "https://api.groq.com/openai/v1");

var engine = new DocNestQueryEngine(retriever, llm);
var result = await engine.AnswerAsync(document, "Summarise the key risks.", allowLlm: true);
Console.WriteLine(string.Join(", ", result.Citations));  // ["§5.2", "§5.3"]
file  → IParser → DocNestPipeline (normalise · key-numbers · keywords) → Document → .udf
query → HybridRetriever (BM25 + dense + cross-encoder rerank + RRF + 1-hop graph) → top-k
      → DocNestQueryEngine (5 layers) → answer + citations + tokens + confidence
```

| Layer | Mechanism | Tokens |
|---|---|---|
| 0 | Pre-computed key-numbers / summary | 0 |
| 1 | Extractive from the top section | 0 |
| 2 | Single-section LLM | ~300 |
| 3 | Multi-section synthesis (reranked context) | ~900 |
| 4 | Broad fallback over retrieved sections | ~1,500 |

The engine climbs this ladder only when a cheaper rung isn't confident. Layers 0–1 handle a surprising share of real factual questions at zero cost — you pay tokens only for genuine synthesis.

A multi-format eval — 10 documents, 88 questions, 5 formats (the same set as the Python reference), dense + cross-encoder rerank, `gpt-oss-120b`

narrator, `qwen2.5`

judge:

| Format | Score | Hit-rate (≥7) |
|---|---|---|
| XLSX | 8.7 / 10 | 93% |
| MD | 8.7 / 10 | 100% |
| DOCX | 7.0 / 10 | 79% |
| HTML | 4.8 / 10 | 50% |
| 6.8 / 10 | 70% | |
Overall |
~7.1 / 10 |
~78% |

The Python reference sits at **8.5/10**. This .NET port is at **7.1** and closing the gap slice by slice — the cross-encoder reranker alone dragged PDFs from **5.1 → 6.8** (hit-rate 47% → 70%). HTML is clearly my weakest format right now, and it's the next thing I'm fixing.

I could have cherry-picked a kinder run and quoted a bigger number. I'd rather ship the reproducible one with the eval harness sitting right next to it in the repo. If you don't trust a benchmark you can't re-run, neither do I.

| Package | Role |
|---|---|
`DocNest.Abstractions` |
Domain records + wrapper interfaces |
`DocNest.Core` |
Pipeline, normaliser, `.udf` reader/writer, quantizer |
`DocNest.Parsers` |
md / html / csv / docx / xlsx / pdf |
`DocNest.Embeddings` |
ONNX MiniLM embedder + ms-marco cross-encoder reranker |
`DocNest.Retrieval` |
Hybrid retriever (FTS5 BM25 + dense + rerank + RRF + graph) |
`DocNest.Query` |
5-layer answer engine + LLM providers |
`DocNest.Storage` |
`.udf` ZIP storage backend |
`DocNest.Cli` |
`docnest` dotnet tool |

Parsers cover PDF (PdfPig), DOCX/XLSX (OpenXML), HTML (AngleSharp), CSV/TSV and Markdown. Every external dependency lives behind a DocNest interface, so swapping any of them is a one-line change.

This is **pre-1.0**, built slice-by-slice under a gated protocol: understand → plan → design + ADR → tests-first → full suite green → sign-off, per phase. The core pipeline, hybrid retrieval, cross-encoder reranking and the 5-layer engine are implemented and tested. Cloud embedding providers (OpenAI embeddings and friends) exist in the Python engine but aren't ported yet — embeddings here are local-only by design.

```
dotnet add package DocNest.Core
# or
dotnet tool install -g DocNest.Cli
```

`pip install docnest-ai`

)`.udf`

spec:If you've ever stood up a Python sidecar just to chunk a PDF for a .NET app, I'd genuinely like to know whether this kills that step for you — tell me in the comments. And if it does, a star on the repo helps other .NET folks find it.

*Secure · Fast · Reliable · Cost-Effective*