# Stop Getting 'It Depends' Answers About RAG Architecture

> Source: <https://dev.to/swapnanilsaha/stop-getting-it-depends-answers-about-rag-architecture-1em7>
> Published: 2026-05-21 05:09:30+00:00

Ask five AI engineers which vector database to use for your RAG system. You'll get five different answers, and they'll all start with "it depends."

It depends on your data volume. It depends on your query patterns. It depends on whether you need GDPR compliance. It depends on your team's infra maturity. It depends on your budget. It depends on whether you're doing hybrid search.

The "it depends" answer is technically correct and operationally useless. It turns an architecture decision into an unbounded research project.

I built **RAG Readiness** to make one specific recommendation per component — and explain why.

## The Design Principle: Opinions, Not Options

Most RAG tooling and documentation presents you with a comparison table. Pinecone vs. Weaviate vs. Qdrant vs. Chroma. BM25 vs. dense vs. hybrid. ada-002 vs. text-embedding-3-large.

Comparison tables are useful if you already know which dimensions matter for your use case. They're paralyzing if you don't.

RAG Readiness is opinionated by design. You describe your use case, your data, your constraints. The tool returns **one choice per component** — with full reasoning.

If GDPR applies, managed cloud vector databases are eliminated from consideration before the LLM is even called. That's a rule, not an LLM judgment. The recommendation you receive is already constraint-filtered.

## Six Modes, One Tool

### Architecture Recommendation

The core mode. Answer a structured set of questions about your use case — document types, query patterns, scale, compliance requirements, team capabilities. Get back:

-
**Vector database**: one specific choice with rationale -
**Embedding model**: one specific choice -
**Chunking strategy**: one specific approach with parameters -
**Retrieval method**: dense / BM25 / hybrid — one answer -
**Reranker**: whether you need one and which

```
python main.py audit --interactive
# or from file:
python main.py audit --file examples/usecase_legal_contracts.json --with-cost
```

### Architecture Diagnosis

You already have a RAG system. It's not working. This mode takes your existing architecture and the problems you're seeing, and returns a root-cause analysis per component with severity levels and one specific fix.

Not "improve your chunking" — "switch from fixed 512-token chunks to parent-child hierarchical chunking with 512-token child nodes. Your documents have multi-clause structure that fixed chunks split mid-sentence."

```
python main.py diagnose --file examples/diagnosis_pinecone_fixed.json
```

**Example output:**

```
overall_severity: critical

chunking_strategy — critical
  "Fixed 512-token chunks split mid-clause in long legal documents"
  Fix: Parent-child hierarchical chunking, 512-token child nodes

retrieval_method — high
  "Dense-only misses exact terms like dollar amounts and clause references"
  Fix: Hybrid BM25 + dense with RRF fusion

quick_fix: Enable 10% token overlap today. Takes 20 minutes, reduces
           the worst failures while you implement the full fix.
```

### Multi-Use-Case Session

Run up to 5 parallel audits in a single request — useful when you're scoping a RAG platform that needs to serve multiple internal teams.

The output includes cross-cutting insights: which components can be shared across use cases, where requirements conflict (the legal team needs GDPR-compliant storage; the sales team wants managed cloud), and which use case to build first for the highest return on the shared infrastructure investment.

### Implementation Bundle

Once you have an architecture you trust, generate a complete implementation starter kit:

```
python main.py bundle <session-id>
```

Output: a `requirements.txt`

, `docker-compose.yml`

, `.env.example`

, and migration guide tailored to the recommended architecture. If you have an existing stack, you get ordered migration steps with rollback notes.

### Cost Estimation

Rule-based monthly cost breakdown per component — **no LLM call**. Lookup tables for vector DB pricing tiers, embedding API costs, reranker inference, and LLM costs at your estimated query volume.

```
python main.py cost <session-id>
```

Returns a line-item breakdown, optimization tips (e.g., "switching to a self-hosted embedding model saves ~$800/month at this query volume"), and a hosting model classification (managed vs. self-hosted trade-off at your scale).

### RAGAS Eval Dataset Generation

Generate evaluation questions grounded in your actual use case and query patterns — not generic retrieval questions.

```
python main.py eval-dataset <session-id> --num-questions 20
```

Output includes easy/medium/hard distribution, RAGAS metric mapping (which questions test faithfulness vs. answer relevancy vs. context precision), an annotation guide, and a time estimate for human review.

## Session Persistence and Refinement

Every audit persists to SQLite. You can refine against new constraints:

```
python main.py refine <session-id> --feedback "Qdrant was too heavy for our infra team"
```

The tool re-runs with the feedback as an additional constraint. Refinement history is tracked — you can see how the recommendation evolved across iterations.

## A Complete Quickstart

```
git clone https://github.com/swapnanil/rag-readiness
cd rag-readiness
cp .env.example .env  # add your ANTHROPIC_API_KEY
docker-compose up api

# New architecture audit (interactive)
python main.py audit --interactive

# Diagnose a broken stack
python main.py diagnose --interactive

# Multi-use-case session
python main.py multi-audit examples/multi_usecase_lexvault.json

# List sessions and refine
python main.py sessions
python main.py refine <session-id> --feedback "need self-hosted only"

# Cost breakdown and eval dataset
python main.py cost <session-id>
python main.py eval-dataset <session-id> --num-questions 20
```

## The Pre-Scoring Layer

Before any LLM call, a rule-based pre-scorer computes a complexity score (1–10) from the use case inputs. This has two effects:

- It calibrates the LLM prompt — a complexity-1 use case gets a simpler, more direct recommendation; a complexity-9 use case gets a recommendation with more explicit trade-off reasoning.
- It runs conflict detection — if your inputs contain contradictory constraints (e.g., "GDPR compliant" + "use Pinecone"), the conflict is flagged before the LLM is called, not discovered in the output.

## Who This Is For

-
**AI engineers** starting a new RAG project who want a structured starting point rather than a blank page -
**Engineering leads** who need to scope a RAG system for a business use case and justify the architecture choices to non-technical stakeholders -
**Teams with an existing RAG system** that isn't performing as expected and need a systematic diagnosis, not a hunch

The tool is open-source, runs locally, and persists everything to SQLite. Your use case details don't leave your environment beyond the single LLM API call per audit.
