# I indexed 936 Lex Fridman episodes into a RAG that cites its sources

> Source: <https://github.com/aranajhonny/omnipod>
> Published: 2026-06-15 04:24:28+00:00

**Chat with 936 podcast episodes. Every answer cites its source.**

Ask "What did Karpathy say about neural networks?" — get an answer with the exact transcript chunk it came from. No hallucinations. No guessing.

Most RAG chatbots hallucinate. You ask about a podcast, they invent quotes.

OmniPod doesn't. Every response is **grounded** — verified against the actual transcript before it reaches you. If the source doesn't support the answer, it says so.

**Three query types, one pipeline:**

| Type | Example | Strategy |
|---|---|---|
| Factual | "What did Huberman say about sleep?" | Retrieve → Generate → Verify |
| Synthetic | "Compare AI safety views across guests" | Map-Reduce → Deduplicate → Synthesize |
| Generative | "Write an essay on consciousness from these episodes" | Plan → Draft → Ground |

```
You ask a question
        │
        ▼
  ┌─────────────┐
  │   Router     │  classify_intent() — routes to the right handler
  │  LRU cache   │  avoids re-embedding repeated queries
  │  Semaphore   │  caps concurrent LLM calls at 5
  └──────┬──────┘
         │
         ▼
  ┌─────────────┐
  │  Retrieval   │  bge-small-en-v1.5 (384d) → Qdrant cosine
  │  19,140      │  chunks from 936 Lex Fridman episodes
  │  chunks      │  Guest filtering via known-guests index
  └──────┬──────┘
         │
         ▼
  ┌─────────────┐
  │  Generate +  │  DeepSeek V4 Flash via OpenCode API
  │  Verify      │  verify_groundedness() — rejects ungrounded answers
  └──────┬──────┘
         │
         ▼
  Cited answer in Chainlit UI (localhost:8000)
git clone https://github.com/aranajhonny/omnipod && cd omnipod
python3.13 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
echo "OPENCODE_API_KEY=sk-your-key" > .env
docker run -d --name qdrant -p 6333:6333 qdrant/qdrant
python ingest.py --rebuild
chainlit run app.py
# → http://localhost:8000
```

| Metric | Value |
|---|---|
| Episodes indexed | 936 Lex Fridman |
| Chunks | 19,140 (512 chars, 128 overlap) |
| Embedding dim | 384 (bge-small-en-v1.5, MPS GPU) |
| Query embedding | ~100ms |
| Vector search | ~50ms (cosine, 19K points) |
| Full answer | ~2s on M1 Pro |
| Full ingest | ~8 min |
| Codebase | 1,138 lines Python, 9 files |

No YouTube API key needed. Two sources:

**lexfridman.com**— scrapes official transcript pages (requests + BeautifulSoup)** YouTube**— uses free proxy at`youtubetranscript.pro`

for auto-captions

```
cd lex_podcast
pip install requests beautifulsoup4
python run.py pipeline  # scrapes all 936 episodes
```

Output lands in `data/transcripts/`

.

```
"What did Andrej Karpathy say about neural networks?"
"Compare views on AI safety across all guests"
"Write a short essay on human consciousness based on these episodes"
"Summarize what Andrew Huberman says about sleep"
```

**Why** 384-dim embeddings are fast to search and good enough for conversational podcast text. Runs locally on MPS GPU.`bge-small-en-v1.5`

?**Why Qdrant over Chroma?** Cosine search at 19K points in ~50ms. Filterable by guest metadata out of the box.**Why intent routing?** Factual, synthetic, and generative queries need fundamentally different retrieval and generation strategies. One prompt fits all fails at scale.**Why groundedness verification?** LLMs default to confident BS.`verify_groundedness()`

forces the model to check its answer against the retrieved context before showing it to the user.

MIT
