Stop Getting 'It Depends' Answers About RAG Architecture

"RAG Readiness," a tool designed to eliminate the vague "it depends" answers that plague Retrieval-Augmented Generation (RAG) architecture decisions. Instead of providing comparison tables, the tool uses a rule-based system to filter constraints (like GDPR compliance) and returns a single, specific recommendation per component with full reasoning. It also offers diagnostic, cost analysis, and evaluation dataset generation features to help teams build and optimize RAG systems.

Ask five AI engineers which vector database to use for your RAG system. You'll get five different answers, and they'll all start with "it depends." It depends on your data volume. It depends on your query patterns. It depends on whether you need GDPR compliance. It depends on your team's infra maturity. It depends on your budget. It depends on whether you're doing hybrid search. The "it depends" answer is technically correct and operationally useless. It turns an architecture decision into an unbounded research project. I built RAG Readiness to make one specific recommendation per component — and explain why. The Design Principle: Opinions, Not Options Most RAG tooling and documentation presents you with a comparison table. Pinecone vs. Weaviate vs. Qdrant vs. Chroma. BM25 vs. dense vs. hybrid. ada-002 vs. text-embedding-3-large. Comparison tables are useful if you already know which dimensions matter for your use case. They're paralyzing if you don't. RAG Readiness is opinionated by design. You describe your use case, your data, your constraints. The tool returns one choice per component — with full reasoning. If GDPR applies, managed cloud vector databases are eliminated from consideration before the LLM is even called. That's a rule, not an LLM judgment. The recommendation you receive is already constraint-filtered. Six Modes, One Tool Architecture Recommendation The core mode. Answer a structured set of questions about your use case — document types, query patterns, scale, compliance requirements, team capabilities. Get back: - Vector database : one specific choice with rationale - Embedding model : one specific choice - Chunking strategy : one specific approach with parameters - Retrieval method : dense / BM25 / hybrid — one answer - Reranker : whether you need one and which python main.py audit --interactive or from file: python main.py audit --file examples/usecase legal contracts.json --with-cost Architecture Diagnosis You already have a RAG system. It's not working. This mode takes your existing architecture and the problems you're seeing, and returns a root-cause analysis per component with severity levels and one specific fix. Not "improve your chunking" — "switch from fixed 512-token chunks to parent-child hierarchical chunking with 512-token child nodes. Your documents have multi-clause structure that fixed chunks split mid-sentence." python main.py diagnose --file examples/diagnosis pinecone fixed.json Example output: overall severity: critical chunking strategy — critical "Fixed 512-token chunks split mid-clause in long legal documents" Fix: Parent-child hierarchical chunking, 512-token child nodes retrieval method — high "Dense-only misses exact terms like dollar amounts and clause references" Fix: Hybrid BM25 + dense with RRF fusion quick fix: Enable 10% token overlap today. Takes 20 minutes, reduces the worst failures while you implement the full fix. Multi-Use-Case Session Run up to 5 parallel audits in a single request — useful when you're scoping a RAG platform that needs to serve multiple internal teams. The output includes cross-cutting insights: which components can be shared across use cases, where requirements conflict the legal team needs GDPR-compliant storage; the sales team wants managed cloud , and which use case to build first for the highest return on the shared infrastructure investment. Implementation Bundle Once you have an architecture you trust, generate a complete implementation starter kit: python main.py bundle <session-id Output: a requirements.txt , docker-compose.yml , .env.example , and migration guide tailored to the recommended architecture. If you have an existing stack, you get ordered migration steps with rollback notes. Cost Estimation Rule-based monthly cost breakdown per component — no LLM call . Lookup tables for vector DB pricing tiers, embedding API costs, reranker inference, and LLM costs at your estimated query volume. python main.py cost <session-id Returns a line-item breakdown, optimization tips e.g., "switching to a self-hosted embedding model saves ~$800/month at this query volume" , and a hosting model classification managed vs. self-hosted trade-off at your scale . RAGAS Eval Dataset Generation Generate evaluation questions grounded in your actual use case and query patterns — not generic retrieval questions. python main.py eval-dataset <session-id --num-questions 20 Output includes easy/medium/hard distribution, RAGAS metric mapping which questions test faithfulness vs. answer relevancy vs. context precision , an annotation guide, and a time estimate for human review. Session Persistence and Refinement Every audit persists to SQLite. You can refine against new constraints: python main.py refine <session-id --feedback "Qdrant was too heavy for our infra team" The tool re-runs with the feedback as an additional constraint. Refinement history is tracked — you can see how the recommendation evolved across iterations. A Complete Quickstart git clone https://github.com/swapnanil/rag-readiness cd rag-readiness cp .env.example .env add your ANTHROPIC API KEY docker-compose up api New architecture audit interactive python main.py audit --interactive Diagnose a broken stack python main.py diagnose --interactive Multi-use-case session python main.py multi-audit examples/multi usecase lexvault.json List sessions and refine python main.py sessions python main.py refine <session-id --feedback "need self-hosted only" Cost breakdown and eval dataset python main.py cost <session-id python main.py eval-dataset <session-id --num-questions 20 The Pre-Scoring Layer Before any LLM call, a rule-based pre-scorer computes a complexity score 1–10 from the use case inputs. This has two effects: - It calibrates the LLM prompt — a complexity-1 use case gets a simpler, more direct recommendation; a complexity-9 use case gets a recommendation with more explicit trade-off reasoning. - It runs conflict detection — if your inputs contain contradictory constraints e.g., "GDPR compliant" + "use Pinecone" , the conflict is flagged before the LLM is called, not discovered in the output. Who This Is For - AI engineers starting a new RAG project who want a structured starting point rather than a blank page - Engineering leads who need to scope a RAG system for a business use case and justify the architecture choices to non-technical stakeholders - Teams with an existing RAG system that isn't performing as expected and need a systematic diagnosis, not a hunch The tool is open-source, runs locally, and persists everything to SQLite. Your use case details don't leave your environment beyond the single LLM API call per audit.