Your .NET RAG stack hides a Python sidecar. I built the engine that removes it.

wpnews.pro

TL;DR— Every .NET RAG project quietly ships a Python sidecar to do one job: chunk documents. I got rid of mine.DocNest .NETis an idiomatic C# / .NET 8 port of my[DocNest]engine — embeddings runlocally(ONNX MiniLM, no key, offline), the LLM isoptional(factual questions answered atzero tokens), and the.udf

knowledge base it writes isbyte-compatible with the Python version. Ingest in Python, query in C#. It's on NuGet today.·[Repo].[NuGet]

You're building on .NET. The product needs to answer questions over a pile of PDFs, contracts, spreadsheets — real retrieval-augmented generation. So you go looking for tooling, and you find the same thing I did:

It's all Python.

LangChain, LlamaIndex, every RAG tutorial worth reading — Python, Python, Python. So you do the thing nobody admits to in the architecture review: you stand up a little Python service on the side. A second runtime to containerize, deploy, version, monitor, and wake up to at 3 a.m. when it OOMs. All so it can split a document into chunks and hand them back to your actual app.

A whole extra language in production to chop up a PDF. I stared at that diagram one too many times and decided it had to go.

So I ported DocNest to C#. Not a wrapper shelling out to python.exe

— a real, idiomatic .NET port. async

/await

end to end, every dependency behind an interface, shipped as proper NuGet packages. Nothing Python left in the runtime.

But to explain why DocNest is worth porting, I have to tell you about the bug that started the whole thing.

A RAG app I'd built gave a client a confidently wrong number. Not "I don't know" — a clean, specific, wrong answer, delivered with total confidence. I spent three days assuming my retrieval ranking was off, tuning embeddings and k

values and similarity thresholds.

The ranking was fine. The problem happened before any of that — at ingestion. Here's how almost every pipeline reads a document:

PDF → extract text → split every 512 chars → embed → store → hope

Watch what that does to a revenue table:

chunk_1: "45.2%  Q3  Europe  38.1%  Q2  Europe  41.7%  Q3"
chunk_2: "Asia   29.3%  Q2  Asia  Americas  52.1%  Q3  Ame"

The headers are gone. The rows are shredded across a chunk boundary. The model receives a bag of loose numbers with no idea which is revenue, which is a quarter, which region they belong to — and fills the gap with a confident guess. That's not a model problem or a retrieval problem. It's an ingestion problem. You destroyed the meaning before the model ever saw the data.

A person doesn't read a report as one long character stream. They see headings, sections, a table with columns. DocNest does the same: it reads the document's structure first. Every heading becomes a navigable §section

. Every table is preserved as structured data — never flattened:

{
  "section": "§4.2 Revenue by Region",
  "table": {
    "headers": ["Region", "Q2", "Q3", "Change"],
    "rows": [
      ["Europe", "38.1%", "45.2%", "+7.1pp"],
      ["Asia",   "29.3%", "41.7%", "+12.4pp"]
    ]
  }
}

Same numbers, same model, same question — but now the answer is right, and it comes with a citation. The document is normalised once into a portable .udf

file: a self-contained ZIP holding the section index, key numbers, keywords, section text, and quantised embeddings. Parse once, query forever.

Here's the part I'm proud of. The .udf

format is an open spec, and the .NET writer produces files that are byte-compatible with the Python engine. That one constraint unlocks something genuinely useful:

.udf

to your One ingestion ecosystem, two languages, the same artifact moving between them. Nothing in the codebase is allowed to break that cross-ecosystem contract — it's the whole point.

When I describe this, two questions come back every time. They're actually two independent choices:

1. Embeddings run locally. A small ONNX MiniLM model (~90 MB) downloads once and caches. No API key, fully offline. There's an optional ONNX cross-encoder reranker for dense PDFs.

2. The LLM is optional. Answer Layers 0–1 resolve factual questions deterministically — zero tokens, no key. You only bring an LLM for synthesis, and when you do, "OpenAI" means the answer model, not embeddings. The two never get coupled.

dotnet add package DocNest.Core
dotnet add package DocNest.Parsers
dotnet add package DocNest.Retrieval
dotnet add package DocNest.Query
using DocNest;
using DocNest.Parsers;
using DocNest.Pipeline;
using DocNest.Query;
using DocNest.Retrieval;
using DocNest.Udf;

// Parse → normalise → write a portable .udf
var raw = await new ParserFactory().Get("report.pdf").ParseAsync("report.pdf");
var doc = new DocNestPipeline().Process(raw);
await new UdfWriter().WriteAsync(doc, "report.udf");

// Load it back and ask — deterministic layers, no LLM
var document = (await UdfReader.LoadAsync("report.udf")).ToDocument();

using var retriever = new HybridRetriever(".docnest_cache");
var engine = new DocNestQueryEngine(retriever);   // no LLM → Layers 0–1 only
var result = await engine.AnswerAsync(document, "What was Q3 revenue?", allowLlm: false);

Console.WriteLine(result.Answer);     // "Q3 revenue: $38M (source: §3.1)"
Console.WriteLine(result.TokensUsed); // 0

Prefer the terminal?

dotnet tool install -g DocNest.Cli
docnest convert report.pdf -o report.udf
docnest query report.udf "What was Q3 revenue?"

OpenAiCompatibleLlmProvider

talks to OpenAI, Groq, Cerebras, Together, OpenRouter and local servers (Ollama, LM Studio) — change the base URL and model. Anthropic has its own provider.

ILlmProvider llm = new OpenAiCompatibleLlmProvider(
    apiKey:  Environment.GetEnvironmentVariable("GROQ_API_KEY")!,
    model:   "llama-3.3-70b-versatile",
    baseUrl: "https://api.groq.com/openai/v1");

var engine = new DocNestQueryEngine(retriever, llm);
var result = await engine.AnswerAsync(document, "Summarise the key risks.", allowLlm: true);
Console.WriteLine(string.Join(", ", result.Citations));  // ["§5.2", "§5.3"]
file  → IParser → DocNestPipeline (normalise · key-numbers · keywords) → Document → .udf
query → HybridRetriever (BM25 + dense + cross-encoder rerank + RRF + 1-hop graph) → top-k
      → DocNestQueryEngine (5 layers) → answer + citations + tokens + confidence

Layer	Mechanism	Tokens
0	Pre-computed key-numbers / summary	0
1	Extractive from the top section	0
2	Single-section LLM	~300
3	Multi-section synthesis (reranked context)	~900
4	Broad fallback over retrieved sections	~1,500

The engine climbs this ladder only when a cheaper rung isn't confident. Layers 0–1 handle a surprising share of real factual questions at zero cost — you pay tokens only for genuine synthesis.

A multi-format eval — 10 documents, 88 questions, 5 formats (the same set as the Python reference), dense + cross-encoder rerank, gpt-oss-120b

narrator, qwen2.5

judge:

Format	Score	Hit-rate (≥7)
XLSX	8.7 / 10	93%
MD	8.7 / 10	100%
DOCX	7.0 / 10	79%
HTML	4.8 / 10	50%
6.8 / 10	70%
Overall
~7.1 / 10
~78%

The Python reference sits at 8.5/10. This .NET port is at 7.1 and closing the gap slice by slice — the cross-encoder reranker alone dragged PDFs from 5.1 → 6.8 (hit-rate 47% → 70%). HTML is clearly my weakest format right now, and it's the next thing I'm fixing.

I could have cherry-picked a kinder run and quoted a bigger number. I'd rather ship the reproducible one with the eval harness sitting right next to it in the repo. If you don't trust a benchmark you can't re-run, neither do I.

Package	Role
`DocNest.Abstractions`
Domain records + wrapper interfaces
`DocNest.Core`
Pipeline, normaliser, `.udf` reader/writer, quantizer
`DocNest.Parsers`
md / html / csv / docx / xlsx / pdf
`DocNest.Embeddings`
ONNX MiniLM embedder + ms-marco cross-encoder reranker
`DocNest.Retrieval`
Hybrid retriever (FTS5 BM25 + dense + rerank + RRF + graph)
`DocNest.Query`
5-layer answer engine + LLM providers
`DocNest.Storage`
`.udf` ZIP storage backend
`DocNest.Cli`
`docnest` dotnet tool

Parsers cover PDF (PdfPig), DOCX/XLSX (OpenXML), HTML (AngleSharp), CSV/TSV and Markdown. Every external dependency lives behind a DocNest interface, so swapping any of them is a one-line change.

This is pre-1.0, built slice-by-slice under a gated protocol: understand → plan → design + ADR → tests-first → full suite green → sign-off, per phase. The core pipeline, hybrid retrieval, cross-encoder reranking and the 5-layer engine are implemented and tested. Cloud embedding providers (OpenAI embeddings and friends) exist in the Python engine but aren't ported yet — embeddings here are local-only by design.

dotnet add package DocNest.Core
dotnet tool install -g DocNest.Cli

pip install docnest-ai

).udf

spec:If you've ever stood up a Python sidecar just to chunk a PDF for a .NET app, I'd genuinely like to know whether this kills that step for you — tell me in the comments. And if it does, a star on the repo helps other .NET folks find it.

Secure · Fast · Reliable · Cost-Effective

source & further reading

dev.to — original article Turning a Tiny Language Model Into a Trustworthy Agent: An R&D Experiment with HUQAN + OPT-125M HUQAN: The Deterministic Trust Layer That Tells AI Agents "Wait, I Decide First" The TomeVault Instruction Corpus (2026-07)

Your .NET RAG stack hides a Python sidecar. I built the engine that removes it.

Run your AI side-project on zahid.host