How We Translate Entire Books with LLMs Without Losing Context LectuLibre developed a chunking strategy to translate entire books using large language models while preserving narrative coherence. The pipeline parses documents into logical units like chapters, splits them at sentence boundaries, and uses overlapping context windows to maintain continuity across chunks. This approach keeps translations consistent across thousands of pages while respecting token limits and reducing API costs. Our chunking strategy that keeps chapters coherent, respects context windows, and handles multi-lingual books. At LectuLibre https://lectulibre.com , we translate entire books — novels, technical manuals, poetry — using large language models. It sounds simple: feed each paragraph to an LLM, concatenate results, done. But the moment we tried a 300‑page EPUB, chaos ensued. Chapters bled into each other, sentences were chopped mid‑word, and the translation of chapter 5 had no idea what happened in chapter 4. LLMs have limited context windows. Even the massive 200K token window of Claude 3 can’t hold a whole 150K‑word book. And even if it could, the cost and latency would be absurd. We needed a way to split the book into manageable chunks while preserving enough context so that the translation remains coherent across thousands of pages. Here’s how we designed a chunking pipeline that respects your wallet, the context window, and the book’s narrative flow. Naively splitting by character count is a recipe for disaster. Instead, we first parse the document to understand its logical units: chapters, sections, headings. For EPUB, we use ebooklib ; for PDF, pdfplumber . Both give us a stream of items paragraphs, headings that we then organize into a tree of chapters and sub‑sections. python import ebooklib from ebooklib import epub def get chapters epub path : book = epub.read epub epub path chapters = for item in book.get items of type ebooklib.ITEM DOCUMENT : Simplified: each document is a chapter content = item.get content .decode 'utf-8' chapters.append content return chapters In practice, we use BeautifulSoup to extract