{"slug": "54-60-days-system-design-questions", "title": "54/60 Days System Design Questions", "summary": "A developer describes a common problem with RAG pipelines: data drift causes stale search results even when no code changes are made. They present four solutions for keeping a 400GB FAISS index fresh: scheduled full rebuild, incremental upserts, embedding version registry, and approximate staleness detection. The post asks readers to choose the best approach given a 50M-chunk corpus and model updates every six weeks.", "body_md": "You built a RAG pipeline. Works great in dev.\n\n6 months later, your users complain: \"The search results are garbage.\"\n\nYou haven't changed a line of code.\n\nHere's what happened:\n\nYour product evolved. New features, new docs, new support tickets. The data drifted — but your embedding index didn't.\n\nNow you're serving a 400GB FAISS index that was last rebuilt in January. Your chunks are stale. Your nearest-neighbor results point to deprecated docs. Your LLM is confidently hallucinating from outdated context.\n\nYou need to fix this. 4 engineers each propose a solution:\n\nA) Scheduled full rebuild\n\nEvery Sunday, re-embed the entire corpus from scratch. Replace the index atomically. Slow (4h+ at scale), expensive, but always fresh.\n\nB) Incremental upserts + soft delete\n\nOn every document change, re-embed only the affected chunks. Mark deleted chunks as tombstoned. Keep a version field on each vector. Index size grows over time; compact quarterly.\n\nC) Embedding version registry + hot swap\n\nTrack which embedding model version produced each vector. When the model drifts (fine-tuned or upgraded), invalidate the mismatched vectors and rebuild only those. Two indexes run in parallel during migration. Route traffic by model version.\n\nD) Approximate staleness detection\n\nRun a nightly job that samples 1% of your corpus, re-embeds it, and measures cosine distance against the stored vector. If drift exceeds a threshold, trigger a full rebuild. Otherwise, skip it. Cheap monitoring, reactive rebuilds.\n\nReal constraint: your corpus is 50M chunks. Full rebuild = 4 hours + ~$800 in embedding API cost. You deploy model updates every 6 weeks.\n\nPick one — A, B, C, or D — and tell me why. Full breakdown in the comments.", "url": "https://wpnews.pro/news/54-60-days-system-design-questions", "canonical_source": "https://dev.to/thejoud1997/5460-days-system-design-questions-4ojp", "published_at": "2026-06-29 16:20:56+00:00", "updated_at": "2026-06-29 16:49:24.358997+00:00", "lang": "en", "topics": ["machine-learning", "large-language-models", "ai-infrastructure", "developer-tools"], "entities": ["FAISS", "LLM"], "alternates": {"html": "https://wpnews.pro/news/54-60-days-system-design-questions", "markdown": "https://wpnews.pro/news/54-60-days-system-design-questions.md", "text": "https://wpnews.pro/news/54-60-days-system-design-questions.txt", "jsonld": "https://wpnews.pro/news/54-60-days-system-design-questions.jsonld"}}