{"slug": "why-i-chose-mcp-over-rag-for-live-infrastructure-auditing", "title": "Why I chose MCP over RAG for live infrastructure auditing", "summary": "A developer building a live infrastructure auditing system abandoned RAG (Retrieval-Augmented Generation) after discovering it reported a device as compliant when the underlying data snapshot was two days old and the device had since drifted. The engineer switched to a Model Context Protocol (MCP) approach, exposing a live SQLite inventory database and structured policy file as callable tools—get_inventory(), query_policy(), and flag_violation()—eliminating embedding pipelines and staleness issues. The system also includes a FastAPI gateway for rate limiting and intent routing, plus a secondary \"Judge\" LLM that independently verifies the agent's reasoning against the policy file, catching errors the main agent missed.", "body_md": "I've been working on a project to audit distributed hardware infrastructure — devices\n\nspread across multiple sites, each running firmware that needs to stay compliant with a\n\ncentral policy. Pretty standard enterprise ops problem.\n\nMy first instinct was RAG. Everyone reaches for RAG. You embed your documents,\n\nstand up a vector store, and your agent can reason over your data. I've built RAG\n\npipelines before, they work well, so I started there.\n\nThree days in, I switched direction.\n\nI was testing the agent against a scenario where a device had failed a firmware check at\n\n2am. The agent reported it as compliant.\n\nThe problem wasn't the model. The problem was that the data the agent was reasoning\n\nover was from an embedded snapshot I'd generated two days earlier. The device had\n\ndrifted since then. The vector store didn't know — it can't know. It's a snapshot by\n\ndesign.\n\nThat works fine for a documentation assistant. For infrastructure audit it's a problem,\n\nbecause you need to know what's happening now, not what was true when you last ran\n\nthe embedding pipeline.\n\nHere's the reframe that changed how I thought about this.\n\nRAG answers the question: what documents are relevant to this query?\n\nWhat I actually needed to answer was: what is the current state of device X right now?\n\nThose are different questions. One is a search problem. The other is a database query. I\n\nwas using the wrong tool.\n\nThe inventory — firmware versions, device health, site assignments — lives in a SQLite\n\ndatabase. The compliance policy lives in a structured text file. Neither of these is a\n\ndocument in any meaningful sense. Chunking them and embedding them into a vector\n\nstore was me forcing square data into a round hole because that's what I knew how to do.\n\nserver that exposes it as tools the agent can call:\n\n• get_inventory() — returns live device state, current to the second\n\n• query_policy() — reads the policy file and returns the requirements\n\n• flag_violation() — marks a device non-compliant with structured metadata\n\nThe agent calls these the same way your application code calls an API. No embedding\n\npipeline. No staleness problem. No guessing at similarity scores for what is\n\nfundamentally a structured query.\n\nOne thing I'd push back on in most agent tutorials — they wire the LLM directly to the\n\nfrontend and call it done.\n\nI put a FastAPI gateway in between, and I'd do it again every time.\n\nThe practical reason: NVIDIA NIM credits aren't free. A misconfigured client or a\n\nrunaway loop can drain your quota in minutes if there's nothing between the UI and the\n\nmodel. The gateway enforces rate limits per IP before a single token is generated.\n\nSaved me actual money during development.\n\nThe better reason: not every query needs the full audit agent. Simple questions — how\n\nmany nodes are in Bellevue? — don't need a multi-step LangGraph agent burning\n\nGemini 2.5 tokens. The gateway classifies intent and routes accordingly. Simple queries\n\ngo to a lighter NIM worker. Full compliance audits go to the Gemini agent.\n\nIt also centralises auth and logging in one place, which matters when you need to show\n\na security team exactly what the agent did and when.\n\nThis is the piece I'm most glad I built, and the one I almost skipped.\n\nEvery response — whether it came from the NIM worker or the Gemini agent — passes\n\nthrough a secondary LLM before it reaches the user. I call it the Judge. Its only job is to\n\nread the agent's output, check it independently against the policy file, and decide\n\nwhether the reasoning holds up.\n\nDuring testing, the Judge caught something the main agent missed. The agent had\n\ncorrectly identified a non-compliant firmware version, but applied a remediation rule that\n\nbelonged to a different device category. The logic was sound — it just used the wrong\n\nrule. The Judge caught it because it reads the policy independently, without inheriting\n\nwhatever context the main agent had accumulated during its reasoning loop.\n\nThat independence is the point. If the Judge just re-reads the agent's own context, it's\n\nnot really checking anything. You want it reading from the source, fresh.\n\nThe agent can suggest remediation — here's the CLI command to fix the firmware drift\n\non node 7. It cannot run it.\n\nThere's a hard gate in the LangGraph state machine. Suggest remediation and execute\n\nremediation are separate nodes, and the only path between them runs through a human\n\ndecision in the UI. An architect clicks Approve. Then and only then does the write\n\noperation touch the database.\n\nFor infrastructure this felt like the right call. The cost of a false positive — a remediation\n\nthat runs when it shouldn't — is much higher than the cost of an extra approval click.\n\nTwo things.\n\nI'd instrument RAGAS metrics from day one. I ended up retrofitting evaluation on the\n\nagent's audit outputs and found gaps I'd been manually poking at for weeks.\n\nFaithfulness and context relevancy scores would have surfaced those faster.\n\nAnd I'd write the red-team report in parallel, not after. I know what failure modes the\n\nJudge catches now, but I reconstructed most of that knowledge from memory rather\n\nthan documenting it as I found it. A live failure log from the start would've made that\n\nreport much sharper.\n\nRAG is the right tool for knowledge retrieval over static content. It's a less natural fit\n\nwhen your agent needs to query live structured data and act on what it finds.\n\nMCP let me give the agent real database access through a typed tool interface — no\n\nembedding pipeline, no staleness, no similarity search on what is fundamentally a\n\nrelational query. For infrastructure audit, that was the right call.\n\nCode is on GitHub if you want to dig into the architecture. Happy to go deeper on the\n\nLangGraph state machine or the Judge design in the comments.", "url": "https://wpnews.pro/news/why-i-chose-mcp-over-rag-for-live-infrastructure-auditing", "canonical_source": "https://dev.to/dnyandeo/why-i-chose-mcp-over-rag-for-live-infrastructure-auditing-1ce8", "published_at": "2026-05-28 22:41:53+00:00", "updated_at": "2026-05-28 22:55:10.494227+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "large-language-models", "artificial-intelligence", "ai-tools"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/why-i-chose-mcp-over-rag-for-live-infrastructure-auditing", "markdown": "https://wpnews.pro/news/why-i-chose-mcp-over-rag-for-live-infrastructure-auditing.md", "text": "https://wpnews.pro/news/why-i-chose-mcp-over-rag-for-live-infrastructure-auditing.txt", "jsonld": "https://wpnews.pro/news/why-i-chose-mcp-over-rag-for-live-infrastructure-auditing.jsonld"}}