{"slug": "what-enterprise-rag-is-ready-for-today-and-what-production-deployment-actually", "title": "What Enterprise RAG Is Ready For Today and What Production Deployment Actually Requires", "summary": "Based on the article, the local implementation of this Enterprise RAG system is fully functional, featuring pre-retrieval access control, citation-backed answers, and a specialized evaluation set, all running without external dependencies. However, deploying it to production requires specific upgrades, including integrating Entra ID for role derivation, switching to semantic or hybrid retrieval via Azure AI Search, and implementing distributed rate limiting, PII classification, and tenant isolation. The gap between the local demo and a production-ready system is defined by these concrete infrastructure and security requirements, though the architecture is designed to scale with only environment variable changes.", "body_md": "Enterprise RAG — A practitioner's build log | Post 6 of 6\nThis series has documented a system built to a specific standard: one where access control is enforced before retrieval scoring, where every answer includes traceable citations, and where the evaluation set measures restricted document leakage rather than retrieval relevance alone.\nThis final post answers the question that matters most for teams considering this as a foundation: what works today, what needs to be in place before this handles real internal documents in a production environment, and what the gap between those two states actually looks like.\nEvery item below runs locally without external dependencies or provider credentials.\nDocument pipeline:\nPOST /ingest\n)Query and access control:\nRBAC_blocked_count\nlogged per query — tracks how many chunks were filteredX-API-Key\nheader, preventing request-body role elevationAPI and authentication:\nPOST /query\n) with health probes at GET /health\nPOST /auth/register\n)POST /api-keys\n, POST /api-keys/{id}/revoke\n)ADMIN_TOKEN\nEvaluation:\nPOST /eval/run\n— calls live query pipeline, not a mocked pathOperational controls:\nGET /audit-logs\n)JSON_LOGS=true\n)RATE_LIMIT_PER_MINUTE\n)Infrastructure:\nThe local runtime maps directly to an Azure deployment topology:\nEmployee → Microsoft Entra ID\n↓\nAzure Container Apps: API + Dashboard\n↓\nAzure AI Search (retrieval)\nAzure OpenAI (answer generation)\nAzure PostgreSQL or Cosmos DB (metadata + audit logs)\nAzure Blob Storage (source documents)\n↓\nAzure Key Vault (secrets)\nApplication Insights (logs + metrics)\nSwitching from local to Azure requires environment variable changes only. No code changes. No schema migrations between SQLite and PostgreSQL — the SQLAlchemy layer handles both. Azure mode fails fast when required AZURE_*\nsettings are missing rather than silently degrading to a local fallback.\nEntra ID or OIDC role derivation from identity claims. The local implementation derives role from API key registration. Production deployment should derive role from authenticated identity token claims — not from request parameters or static key registration. The AUTH_PROVIDER=entra\nconfiguration path is implemented. End-to-end validation requires a live Azure tenant.\nSemantic or hybrid retrieval. The local lexical retriever is deterministic and validates access control correctly. It does not match the retrieval quality of embedding-based semantic search for queries without token overlap with document chunks. Azure AI Search vector and hybrid query modes are the planned production retrieval path.\nDistributed rate limiting. The in-memory rate limiter does not share state across multiple API instances. Horizontal scaling requires Redis-backed or API gateway rate limiting.\nPII classification and retention policies. The reference document corpus is synthetic. Before ingesting real internal documents — HR records, finance reports, incident logs — the ingestion pipeline should classify content for PII, apply sensitivity labels, and enforce explicit data retention policies for stored queries and generated answers.\nTenant isolation. The current implementation is single-organization. A deployment serving multiple business units with strict data isolation between them requires a tenant isolation layer at the data model and query pipeline level.\nBroader evaluation set. The current evaluation set is calibrated for access-control validation across a small synthetic corpus. A production evaluation set requires human relevance labels, answer correctness checks, and a regression threshold integrated into the CI workflow.\nEnterprise RAG demonstrates the architecture that matters for internal knowledge systems: pre-retrieval access control, citation-backed answers, and an evaluation standard that measures restricted document leakage. The local implementation is complete, testable, and fully reproducible without provider credentials.\nThe gap to production is real and specific. Entra ID integration, semantic retrieval, distributed rate limiting, PII handling, and tenant isolation are well-understood engineering problems with clear solutions. None of them require rethinking the core pipeline — the access control order, the citation model, and the evaluation structure remain intact.\nFor a team building an internal document Q&A system: the architecture here is worth adopting. The hardening list above is the production backlog, not a reason to start from scratch.\nThe highest-impact single item is Entra ID role derivation in production. The entire value of pre-retrieval access control depends on the role being trustworthy. In a local environment with API key role binding, that trust is reasonable. In a production environment with hundreds of employees, role must come from an authenticated identity provider — not from a manually registered key that may become stale when someone changes teams or leaves the organization.\nThe concrete step: configure AUTH_PROVIDER=entra\n, map Entra group claims to retrieval roles, and validate that the role filter receives the correct role from the token rather than from the request body. That single change makes the access control guarantee durable against organizational changes.\nWhen an employee changes roles or leaves your organization, how quickly does your internal knowledge system stop serving them documents from their previous role? Is that enforced at the identity provider level or at the document system level?\nThis concludes the Enterprise RAG build log series.", "url": "https://wpnews.pro/news/what-enterprise-rag-is-ready-for-today-and-what-production-deployment-actually", "canonical_source": "https://dev.to/manjunath_d35c391da339e5b/what-enterprise-rag-is-ready-for-today-and-what-production-deployment-actually-requires-24jh", "published_at": "2026-05-22 19:31:00+00:00", "updated_at": "2026-05-22 20:04:24.927958+00:00", "lang": "en", "topics": ["enterprise-software", "artificial-intelligence", "large-language-models", "data", "cloud-computing"], "entities": ["Microsoft Entra ID", "Azure Container Apps", "Azure AI Search", "Azure OpenAI", "Azure PostgreSQL", "Cosmos DB", "Azure Blob Storage", "RBAC"], "alternates": {"html": "https://wpnews.pro/news/what-enterprise-rag-is-ready-for-today-and-what-production-deployment-actually", "markdown": "https://wpnews.pro/news/what-enterprise-rag-is-ready-for-today-and-what-production-deployment-actually.md", "text": "https://wpnews.pro/news/what-enterprise-rag-is-ready-for-today-and-what-production-deployment-actually.txt", "jsonld": "https://wpnews.pro/news/what-enterprise-rag-is-ready-for-today-and-what-production-deployment-actually.jsonld"}}