{"slug": "how-we-translate-entire-books-with-llms-without-losing-context", "title": "How We Translate Entire Books with LLMs Without Losing Context", "summary": "LectuLibre developed a chunking strategy to translate entire books using large language models while preserving narrative coherence. The pipeline parses documents into logical units like chapters, splits them at sentence boundaries, and uses overlapping context windows to maintain continuity across chunks. This approach keeps translations consistent across thousands of pages while respecting token limits and reducing API costs.", "body_md": "*Our chunking strategy that keeps chapters coherent, respects context windows, and handles multi-lingual books.*\n\nAt [LectuLibre](https://lectulibre.com), we translate entire books — novels, technical manuals, poetry — using large language models. It sounds simple: feed each paragraph to an LLM, concatenate results, done. But the moment we tried a 300‑page EPUB, chaos ensued. Chapters bled into each other, sentences were chopped mid‑word, and the translation of chapter 5 had no idea what happened in chapter 4.\n\nLLMs have limited context windows. Even the massive 200K token window of Claude 3 can’t hold a whole 150K‑word book. And even if it could, the cost and latency would be absurd. We needed a way to split the book into manageable chunks while preserving enough context so that the translation remains coherent across thousands of pages.\n\nHere’s how we designed a chunking pipeline that respects your wallet, the context window, and the book’s narrative flow.\n\nNaively splitting by character count is a recipe for disaster. Instead, we first parse the document to understand its logical units: chapters, sections, headings. For EPUB, we use `ebooklib`\n\n; for PDF, `pdfplumber`\n\n. Both give us a stream of items (paragraphs, headings) that we then organize into a tree of chapters and sub‑sections.\n\n``` python\nimport ebooklib\nfrom ebooklib import epub\n\ndef get_chapters(epub_path):\n    book = epub.read_epub(epub_path)\n    chapters = []\n    for item in book.get_items_of_type(ebooklib.ITEM_DOCUMENT):\n        # Simplified: each document is a chapter\n        content = item.get_content().decode('utf-8')\n        chapters.append(content)\n    return chapters\n```\n\nIn practice, we use `BeautifulSoup`\n\nto extract `<body>`\n\ntext and identify heading tags (`<h1>`\n\n–`<h6>`\n\n) to build a table of contents. This way, even if a chapter is 20,000 tokens, we keep it together as a single unit until later splitting.\n\nA chapter still needs to be broken down to fit the model’s context window. But we never split mid‑sentence. We use `spaCy`\n\nto tokenize the text into sentences, then greedily group them until we hit a token limit.\n\nWhy not simple character‑based splitting? Because sentences carry semantic boundaries. Breaking inside a sentence occasionally produces artefacts like “He walked to the sta‑” / “‑tion.” LLMs are forgiving but not that forgiving.\n\n``` python\nimport spacy\nfrom transformers import AutoTokenizer  # for accurate token count\n\nnlp = spacy.load(\"en_core_web_sm\")\ntokenizer = AutoTokenizer.from_pretrained(\"claude-tokenizer\")  # custom tokenizer for Claude\n\ndef sentence_split(text):\n    doc = nlp(text)\n    return [sent.text for sent in doc.sents]\n\ndef chunk_sentences(sentences, max_tokens=1800, overlap_sentences=5):\n    chunks = []\n    current_chunk = []\n    current_token_count = 0\n\n    for i, sent in enumerate(sentences):\n        sent_tokens = len(tokenizer.encode(sent))\n        if current_token_count + sent_tokens > max_tokens:\n            # Store chunk with a sliding overlap\n            chunks.append(current_chunk)\n            # Overlap: take last `overlap_sentences` from the chunk just concluded\n            current_chunk = sentences[i - overlap_sentences : i] if i - overlap_sentences > 0 else []\n            current_token_count = sum(len(tokenizer.encode(s)) for s in current_chunk)\n        current_chunk.append(sent)\n        current_token_count += sent_tokens\n    if current_chunk:\n        chunks.append(current_chunk)\n    return chunks\n```\n\nWe set `max_tokens`\n\nto 1800, leaving room for the system prompt, context from previous chunks, and the model’s response. That’s for Claude Haiku, which has a 32K context window. For longer‑context models we’d scale up, but keeping chunks smaller also means faster, cheaper API calls.\n\nThe real magic is what we do *between* chunks. A standalone translation of chunk #5 has no clue that the protagonist just entered a dark cave in chunk #4. Two techniques solved this:\n\n``` python\ndef build_prompt(chunk, previous_context_sentences, summary_so_far):\n    context_left = \" \".join(previous_context_sentences)\n    prompt = f\"\"\"You are translating a book. Here is a summary of the story so far:\n    {summary_so_far}\n\n    And the previous text (for immediate context):\n    \"{context_left}\"\n\n    Now translate the following text to Spanish, preserving tone and style:\n    {chunk}\"\"\"\n    return prompt\n```\n\nThe summary is generated using a separate, cheap call (we use DeepSeek for summaries, even if the main translation uses Claude). This keeps the context token usage minimal while still giving long‑range coherence.\n\nWhy not just include the entire previous chunk? That doubles the token count per call. On a 200K‑word book, that adds up to hundreds of dollars. Summaries cut that cost by ~80% with negligible quality loss.\n\nThe translation loop then looks like this:\n\n```\noverall_summary = \"\"\nprevious_context = []\nfull_translation = []\n\nfor chapter_chunks in all_chunks_by_chapter:\n    chapter_summary = \"\"\n    for i, chunk in enumerate(chapter_chunks):\n        prompt = build_prompt(\n            \" \".join(chunk),\n            previous_context,\n            chapter_summary + \"\\n\" + overall_summary if i > 0 else \"\"\n        )\n        translated = call_llm(prompt)\n        full_translation.append(translated)\n\n        # Update context: keep last 5 sentences of the translated chunk as next context\n        trans_sents = sentence_split(translated)\n        previous_context = trans_sents[-5:]\n\n        # Generate chunk summary asynchronously to save time\n        chunk_summary = call_llm(f\"Summarize this passage in one sentence: {chunk}\")\n        chapter_summary += chunk_summary + \" \"\n    overall_summary += chapter_summary\n```\n\nWe process chunks concurrently using `asyncio`\n\nand `httpx`\n\nto keep translation times reasonable.\n\nTranslating a 120K‑word Spanish novel (“El Quijote”) into English took about 4 minutes end‑to‑end with Claude 3 Haiku. Total API cost: $0.67. The translation was surprisingly fluid — chapters felt connected, and the occasional flashback or pronoun reference (“she” referring to a character introduced three pages earlier) was correctly resolved. Without the context pipeline, the same book would have been riddled with inconsistencies.\n\nWe experimented with other models: DeepSeek‑V3 gave similar quality at half the price but with higher latency, making it better for batch jobs where speed isn’t critical. GPT‑4 Turbo reproduced stylistic flourishes more naturally, but its 16K context window forced us to use even smaller chunks, which sometimes fragmented dialogue. Claude struck the best balance.\n\nBut it’s not perfect. Humor and idioms still occasionally fall flat because the summary can’t encapsulate a running joke. Code blocks and tables inside technical books need special handling — we’re working on a parser that detects them and wraps them in `[CODE]`\n\nmarkers so the LLM doesn’t try to translate variable names. And poetry, with its line breaks and meter, remains a challenge; we’re considering a dedicated poetry‑aware chunker.\n\nIf you’re building long‑document translation using LLMs, invest in a pipeline that:\n\nOur code is not open‑source yet, but we plan to release the core chunking library once we’ve battle‑tested it on more formats.\n\n**How do you handle context in LLM translations?** We’re especially curious about handling highly technical books with equations, footnotes, and cross‑references. Drop your ideas in the comments — let’s figure this out together.", "url": "https://wpnews.pro/news/how-we-translate-entire-books-with-llms-without-losing-context", "canonical_source": "https://dev.to/jacob_gong/how-we-translate-entire-books-with-llms-without-losing-context-2em5", "published_at": "2026-06-18 23:19:00+00:00", "updated_at": "2026-06-18 23:59:41.577089+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "developer-tools"], "entities": ["LectuLibre", "Claude 3", "Claude Haiku", "spaCy", "ebooklib", "pdfplumber", "BeautifulSoup", "Hugging Face"], "alternates": {"html": "https://wpnews.pro/news/how-we-translate-entire-books-with-llms-without-losing-context", "markdown": "https://wpnews.pro/news/how-we-translate-entire-books-with-llms-without-losing-context.md", "text": "https://wpnews.pro/news/how-we-translate-entire-books-with-llms-without-losing-context.txt", "jsonld": "https://wpnews.pro/news/how-we-translate-entire-books-with-llms-without-losing-context.jsonld"}}