38/60 Days System Design Questions

A developer poses a system design question about processing a 150K-word document with a 128K-token LLM, listing four strategies: fixed-size chunking, sliding window, progressive summarization, and truncation. They ask which approach would be best for a 200-page legal contract where the answer could be anywhere, hinting that one method has a hidden failure mode.

Your LLM has 128K tokens. Your document has 150K words. Something has to give. What do you do? A Chunk the document into fixed-size pieces and embed each one — retrieve the top-k at query time. B Use a sliding window — process the document in overlapping chunks, stitch the outputs together. C Summarize each section progressively — feed the running summary forward as context. D Truncate to the most recent tokens and hope the answer is near the end. Three of these are real strategies teams ship to production. One of them will silently give you wrong answers on a predictable class of questions. Pick one — and tell me which you'd actually use on a 200-page legal contract where the answer can be anywhere. I'll drop the full breakdown in the comments — including the failure mode most engineers don't see until they're in production. Drop your answer 👇