I indexed 936 Lex Fridman episodes into a RAG that cites its sources

A developer built OmniPod, a RAG chatbot that indexes 936 Lex Fridman podcast episodes into 19,140 chunks and grounds every answer in verified transcripts, eliminating hallucinations. The system uses intent routing, bge-small-en-v1.5 embeddings, Qdrant vector search, and a groundedness verification step to provide cited answers for factual, synthetic, and generative queries.

Chat with 936 podcast episodes. Every answer cites its source. Ask "What did Karpathy say about neural networks?" — get an answer with the exact transcript chunk it came from. No hallucinations. No guessing. Most RAG chatbots hallucinate. You ask about a podcast, they invent quotes. OmniPod doesn't. Every response is grounded — verified against the actual transcript before it reaches you. If the source doesn't support the answer, it says so. Three query types, one pipeline: | Type | Example | Strategy | |---|---|---| | Factual | "What did Huberman say about sleep?" | Retrieve → Generate → Verify | | Synthetic | "Compare AI safety views across guests" | Map-Reduce → Deduplicate → Synthesize | | Generative | "Write an essay on consciousness from these episodes" | Plan → Draft → Ground | You ask a question │ ▼ ┌─────────────┐ │ Router │ classify intent — routes to the right handler │ LRU cache │ avoids re-embedding repeated queries │ Semaphore │ caps concurrent LLM calls at 5 └──────┬──────┘ │ ▼ ┌─────────────┐ │ Retrieval │ bge-small-en-v1.5 384d → Qdrant cosine │ 19,140 │ chunks from 936 Lex Fridman episodes │ chunks │ Guest filtering via known-guests index └──────┬──────┘ │ ▼ ┌─────────────┐ │ Generate + │ DeepSeek V4 Flash via OpenCode API │ Verify │ verify groundedness — rejects ungrounded answers └──────┬──────┘ │ ▼ Cited answer in Chainlit UI localhost:8000 git clone https://github.com/aranajhonny/omnipod && cd omnipod python3.13 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt echo "OPENCODE API KEY=sk-your-key" .env docker run -d --name qdrant -p 6333:6333 qdrant/qdrant python ingest.py --rebuild chainlit run app.py → http://localhost:8000 | Metric | Value | |---|---| | Episodes indexed | 936 Lex Fridman | | Chunks | 19,140 512 chars, 128 overlap | | Embedding dim | 384 bge-small-en-v1.5, MPS GPU | | Query embedding | ~100ms | | Vector search | ~50ms cosine, 19K points | | Full answer | ~2s on M1 Pro | | Full ingest | ~8 min | | Codebase | 1,138 lines Python, 9 files | No YouTube API key needed. Two sources: lexfridman.com — scrapes official transcript pages requests + BeautifulSoup YouTube — uses free proxy at youtubetranscript.pro for auto-captions cd lex podcast pip install requests beautifulsoup4 python run.py pipeline scrapes all 936 episodes Output lands in data/transcripts/ . "What did Andrej Karpathy say about neural networks?" "Compare views on AI safety across all guests" "Write a short essay on human consciousness based on these episodes" "Summarize what Andrew Huberman says about sleep" Why 384-dim embeddings are fast to search and good enough for conversational podcast text. Runs locally on MPS GPU. bge-small-en-v1.5 ? Why Qdrant over Chroma? Cosine search at 19K points in ~50ms. Filterable by guest metadata out of the box. Why intent routing? Factual, synthetic, and generative queries need fundamentally different retrieval and generation strategies. One prompt fits all fails at scale. Why groundedness verification? LLMs default to confident BS. verify groundedness forces the model to check its answer against the retrieved context before showing it to the user. MIT