{"slug": "three-design-decisions-that-shaped-the-enterprise-rag-retrieval-pipeline", "title": "Three Design Decisions That Shaped the Enterprise RAG Retrieval Pipeline", "summary": "Three key structural decisions made when building an enterprise RAG retrieval pipeline. First, the author chose deterministic lexical retrieval over semantic search for local validation to ensure reliable evaluation sets, accepting the trade-off of missing semantic similarity. Second, the Streamlit dashboard connects to the FastAPI API layer rather than directly to the database to avoid credential distribution issues in cloud environments, adding only a negligible network hop. Third, the evaluation runner is exposed as a standard API endpoint, enabling continuous integration by testing the live query pipeline with each dashboard interaction.", "body_md": "Enterprise RAG — A practitioner's build log | Post 3 of 6\nA retrieval pipeline has more design surface than it appears. The technology choices — vector search, LLM provider, storage engine — get most of the attention. The structural choices — where filtering happens, how evaluation is wired, what the dashboard connects to — determine whether the system actually works correctly in a production environment.\nThis post documents three structural decisions I made in Enterprise RAG, the constraint that drove each one, and the cost I accepted.\nThe default retrieval implementation uses token cosine similarity against a local SQLite chunk store (RAG_RETRIEVAL_PROVIDER=local\n). Not vector embeddings. Not a managed search index. Lexical scoring.\nThis was a sequencing decision, not a technology preference.\nThe constraint: Access control validation requires a deterministic retrieval baseline. If retrieval results vary across runs — because embedding models update, because vector indices are rebuilt, because approximate nearest neighbor algorithms introduce non-determinism — the evaluation set becomes unreliable. A restricted_leak_count\nof zero means nothing if retrieval is non-deterministic and the same query might return different chunks tomorrow.\nLexical retrieval is fully deterministic. Given the same document corpus and the same query, it returns the same ranked chunk list every time. That makes the evaluation set a reliable regression test rather than a probabilistic snapshot.\nThe accepted cost: Lexical scoring does not capture semantic similarity. A question about \"headcount reduction\" will not retrieve a chunk that uses the phrase \"workforce restructuring\" unless there is token overlap. Semantic retrieval closes that gap — at the cost of determinism in the local validation environment.\nThe Azure AI Search adapter (RAG_RETRIEVAL_PROVIDER=azure_ai_search\n) is implemented for production use, where semantic and hybrid query modes are available. The retrieval provider is a configuration switch, not a code change. Switching from local to Azure AI Search does not alter the access control layer, the evaluation runner, or the API surface.\nThe Streamlit dashboard (dashboard/app.py\n) connects to the FastAPI API layer, not the database directly. Every dashboard operation — querying documents, fetching metrics, running evaluations, reviewing the citation log — goes through an authenticated API call.\nThis was not a minor implementation choice. It was a deliberate architectural boundary.\nThe constraint: A dashboard that reads the database directly cannot be deployed in a containerized or cloud environment without granting the dashboard container database credentials. That creates a credential distribution problem: every new environment where the dashboard runs needs database access, which widens the credential surface.\nAn API-backed dashboard has a single credential requirement: the DASHBOARD_API_URL\nand optionally DASHBOARD_ADMIN_TOKEN\n. The dashboard container never holds database credentials. It holds only the API location and the management token. The API enforces authorization. The database credentials stay with the API container.\nThe accepted cost: Every dashboard operation adds one network hop compared to direct database access. For a local development setup this is negligible. For a cloud-deployed dashboard querying an API on the same virtual network, it is also negligible. The cost is only relevant if the dashboard is running in a significantly different network zone from the API — which would itself be an unusual deployment topology.\nThe secondary benefit: the API-backed dashboard tests the public API surface on every dashboard interaction. If the dashboard shows correct data, the API is returning correct data. That is a form of continuous integration that direct database access cannot provide.\nThe evaluation runner is exposed as POST /eval/run\n— a standard API endpoint that runs the evaluation set against the live query pipeline and returns metrics directly.\nMost RAG evaluation setups I have seen are offline scripts: pull a golden set, run retrieval, compare results, write a report. The script does not call the production API. It calls the retrieval components directly, often with mocked or simplified versions of the access control layer.\nThe constraint: If the evaluation script bypasses the access control layer, it cannot detect access control failures. A restricted_leak_count\ncomputed by calling the retriever directly — without going through the role filter — will always be zero, regardless of whether the filter is actually working in production.\nBy routing evaluation through POST /eval/run\n, which calls POST /query\ninternally, the evaluation runner tests the entire pipeline: authentication handling, role filter, retrieval, generation, and citation assembly. Every evaluation case exercises the same code path that a real user request exercises.\nThe accepted cost: Live evaluation runs against the production database. In a high-traffic environment, running a large evaluation set could add query load. The mitigation is to run evaluations at low-traffic windows or against a staging environment — not to move evaluation back to a disconnected script.\nThe current evaluation set is small and optimized for repeatable access-control checks. Extending it with larger golden sets, human relevance labels, and answer correctness checks is a documented roadmap item.\nRole metadata is currently embedded in document front matter — each markdown document has a allowed_roles\nfield that specifies which roles can retrieve it. This is correct for a local deterministic environment where document metadata is under engineering control.\nIn production, role context should come from the identity provider — Entra ID claims or OIDC bearer token attributes — not from request body parameters or document-embedded metadata alone. I did not implement full Entra ID role claim integration because it requires a live Azure tenant to validate. The configuration path is documented and the AUTH_PROVIDER=entra\nsetting is implemented. The end-to-end test of role-from-identity-claim requires a real identity provider.\nThat is a known gap. It is in the production considerations section of docs/security.md\n, not hidden in implementation comments.\nPOST /eval/run\nendpoint requires the ADMIN_TOKEN\nwhen management protection is enabled. Evaluation runs in protected environments require the admin credential.Add one document to the corpus with allowed_roles: [\"finance\"]\n, run POST /eval/run\n, and verify that the new document appears in the blocked count for non-finance evaluation cases. That single test confirms the role filter is reading document metadata correctly and applying it before scoring.\nDoes your internal RAG evaluation pipeline call the same API endpoints that production queries use, or does it call retrieval components directly? If it bypasses the access control layer, does your restricted_leak_count\nmetric actually measure anything?\nNext post: The evaluation metrics that matter for enterprise RAG — and why pass rate alone is not enough to validate a system that handles restricted documents.", "url": "https://wpnews.pro/news/three-design-decisions-that-shaped-the-enterprise-rag-retrieval-pipeline", "canonical_source": "https://dev.to/manjunath_d35c391da339e5b/three-design-decisions-that-shaped-the-enterprise-rag-retrieval-pipeline-50p2", "published_at": "2026-05-21 16:36:00+00:00", "updated_at": "2026-05-21 17:05:14.254957+00:00", "lang": "en", "topics": ["enterprise-software", "data", "large-language-models", "artificial-intelligence", "machine-learning"], "entities": ["Enterprise RAG", "SQLite", "LLM"], "alternates": {"html": "https://wpnews.pro/news/three-design-decisions-that-shaped-the-enterprise-rag-retrieval-pipeline", "markdown": "https://wpnews.pro/news/three-design-decisions-that-shaped-the-enterprise-rag-retrieval-pipeline.md", "text": "https://wpnews.pro/news/three-design-decisions-that-shaped-the-enterprise-rag-retrieval-pipeline.txt", "jsonld": "https://wpnews.pro/news/three-design-decisions-that-shaped-the-enterprise-rag-retrieval-pipeline.jsonld"}}