Chat with 936 podcast episodes. Every answer cites its source.
Ask "What did Karpathy say about neural networks?" β get an answer with the exact transcript chunk it came from. No hallucinations. No guessing.
Most RAG chatbots hallucinate. You ask about a podcast, they invent quotes.
OmniPod doesn't. Every response is grounded β verified against the actual transcript before it reaches you. If the source doesn't support the answer, it says so.
Three query types, one pipeline:
| Type | Example | Strategy |
|---|---|---|
| Factual | "What did Huberman say about sleep?" | Retrieve β Generate β Verify |
| Synthetic | "Compare AI safety views across guests" | Map-Reduce β Deduplicate β Synthesize |
| Generative | "Write an essay on consciousness from these episodes" | Plan β Draft β Ground |
You ask a question
β
βΌ
βββββββββββββββ
β Router β classify_intent() β routes to the right handler
β LRU cache β avoids re-embedding repeated queries
β Semaphore β caps concurrent LLM calls at 5
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β Retrieval β bge-small-en-v1.5 (384d) β Qdrant cosine
β 19,140 β chunks from 936 Lex Fridman episodes
β chunks β Guest filtering via known-guests index
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β Generate + β DeepSeek V4 Flash via OpenCode API
β Verify β verify_groundedness() β rejects ungrounded answers
ββββββββ¬βββββββ
β
βΌ
Cited answer in Chainlit UI (localhost:8000)
git clone https://github.com/aranajhonny/omnipod && cd omnipod
python3.13 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
echo "OPENCODE_API_KEY=sk-your-key" > .env
docker run -d --name qdrant -p 6333:6333 qdrant/qdrant
python ingest.py --rebuild
chainlit run app.py
| Metric | Value |
|---|---|
| Episodes indexed | 936 Lex Fridman |
| Chunks | 19,140 (512 chars, 128 overlap) |
| Embedding dim | 384 (bge-small-en-v1.5, MPS GPU) |
| Query embedding | ~100ms |
| Vector search | ~50ms (cosine, 19K points) |
| Full answer | ~2s on M1 Pro |
| Full ingest | ~8 min |
| Codebase | 1,138 lines Python, 9 files |
No YouTube API key needed. Two sources:
lexfridman.comβ scrapes official transcript pages (requests + BeautifulSoup)** YouTube**β uses free proxy atyoutubetranscript.pro
for auto-captions
cd lex_podcast
pip install requests beautifulsoup4
python run.py pipeline # scrapes all 936 episodes
Output lands in data/transcripts/
.
"What did Andrej Karpathy say about neural networks?"
"Compare views on AI safety across all guests"
"Write a short essay on human consciousness based on these episodes"
"Summarize what Andrew Huberman says about sleep"
Why 384-dim embeddings are fast to search and good enough for conversational podcast text. Runs locally on MPS GPU.bge-small-en-v1.5
?Why Qdrant over Chroma? Cosine search at 19K points in ~50ms. Filterable by guest metadata out of the box.Why intent routing? Factual, synthetic, and generative queries need fundamentally different retrieval and generation strategies. One prompt fits all fails at scale.Why groundedness verification? LLMs default to confident BS.verify_groundedness()
forces the model to check its answer against the retrieved context before showing it to the user.
MIT