{"slug": "building-instant-translation-assistance-for-book-translations-with-python-and", "title": "Building Instant Translation Assistance for Book Translations with Python and LLMs", "summary": "LectuLibre, an AI-powered book translation platform, built an instant translation help feature that lets readers highlight any phrase and receive a context-aware, human-quality translation within seconds. The feature uses Server-Sent Events (SSE) with FastAPI and Claude 3 Haiku to stream translations token-by-token, preserving literary context by fetching surrounding paragraphs. The team overcame challenges in prompt engineering to handle idioms and cultural references while maintaining sub-second latency.", "body_md": "*How we integrated real-time phrase translation feedback into our AI-powered book translation workflow, and what we learned about latency, context, and prompt engineering.*\n\nWhen we launched LectuLibre, our AI-powered book translation platform, users loved the quality of full-chapter translations. But they kept asking for something else: while reading a partially translated book, they'd stumble on an untranslated phrase or an awkward auto-translation and want to quickly get a better version without leaving the page. So we built **即时翻译求助** (Instant Translation Help)—a feature that lets readers highlight any phrase and get a context-aware, human-quality translation within seconds, along with a brief explanation of tricky parts.\n\nHere's how we built it, the technical challenges we faced, and the lessons we learned about stitching LLMs into a real-time reading experience.\n\nMost web apps offer generic translation via API calls—send a sentence to Google Translate, get a result. But that doesn't work for literary texts. A phrase like \"She let the cat out of the bag\" needs to be translated idiomatically, and the appropriate rendering depends heavily on the surrounding paragraphs (is the tone formal? sarcastic? part of a metaphor chain?). Our existing translation pipeline processes entire chapters in bulk with carefully crafted prompts, but for instant help, we needed sub-second latency while preserving that same depth of context.\n\nWe chose **Server-Sent Events (SSE)** over WebSockets because the communication is one-directional (server pushes translation tokens) and SSE is simpler to implement with FastAPI. The client (a React app) sends a POST request with:\n\nOur backend retrieves the surrounding text from PostgreSQL (we store the original book in chunks), feeds a carefully assembled prompt to the LLM (Claude 3 Haiku for speed), and streams the response back token-by-token.\n\nWe index each paragraph with its position. Given a highlighted phrase, we grab the paragraph containing it, plus one paragraph before and after. This usually provides enough narrative context without blowing up the prompt size.\n\n``` python\nasync def get_context(book_id: str, para_index: int, db: AsyncSession):\n    # Fetch surrounding paragraphs\n    stmt = (\n        select(BookParagraph)\n        .where(\n            BookParagraph.book_id == book_id,\n            BookParagraph.index.between(para_index - 1, para_index + 1)\n        )\n        .order_by(BookParagraph.index)\n    )\n    result = await db.execute(stmt)\n    paragraphs = result.scalars().all()\n    return \"\\n\".join(p.text for p in paragraphs)\n```\n\nWe needed a prompt that instructs the LLM to:\n\nHere's the core prompt template:\n\n```\nINSTANT_HELP_PROMPT = \"\"\"\nYou are a literary translator. Below is the source text surrounding a highlighted phrase, the phrase itself, and the target language.\nTranslate the highlighted phrase into {target_lang} in a way that fits the style of the surrounding text.\nIf the phrase contains an idiom, metaphor, or cultural reference, provide a natural equivalent and a one-sentence explanation in parentheses.\nOutput format:\n**Translation:** [your translation]\n**Note:** [explanation if needed]\n\nSurrounding text:\n{context}\n\nHighlighted phrase:\n\"{phrase}\"\n\nTranslation:\n\"\"\"\n```\n\nWe found that Claude 3 Haiku respects this format almost always, and the \"Note\" part is omitted when not needed.\n\nWe built an async endpoint that yields SSE chunks. The client can start rendering the translation as tokens arrive, which feels instant.\n\n``` python\nfrom fastapi import APIRouter, Request\nfrom fastapi.responses import StreamingResponse\nimport json\nimport asyncio\n\nrouter = APIRouter()\n\n@router.post(\"/api/instant-help\")\nasync def instant_help(request: Request):\n    data = await request.json()\n    phrase = data[\"phrase\"]\n    book_id = data[\"bookId\"]\n    para_index = data[\"paraIndex\"]\n    target_lang = data[\"targetLang\"]\n\n    async def event_generator():\n        async with async_session() as db:\n            context = await get_context(book_id, para_index, db)\n        prompt = INSTANT_HELP_PROMPT.format(\n            target_lang=target_lang,\n            context=context,\n            phrase=phrase\n        )\n        # Stream from Claude using the official Anthropic Python SDK\n        async with anthropic.AsyncAnthropic() as client:\n            stream = await client.messages.create(\n                model=\"claude-3-haiku-20240307\",\n                max_tokens=300,\n                temperature=0.3,\n                messages=[{\"role\": \"user\", \"content\": prompt}],\n                stream=True\n            )\n            async for event in stream:\n                if event.type == \"content_block_delta\":\n                    data = json.dumps({\"text\": event.delta.text})\n                    yield f\"data: {data}\\n\\n\"\n                elif event.type == \"message_stop\":\n                    yield \"data: [DONE]\\n\\n\"\n\n    return StreamingResponse(event_generator(), media_type=\"text/event-stream\")\n```\n\nOn the frontend, we use `EventSource`\n\nto consume these events. The whole round-trip from click to first token appears in about 400–600ms for typical phrases.\n\nHaiku is fast but not always perfect. We tried DeepSeek-V2 (slower but better with idioms) but its latency crossed 2 seconds, killing the \"instant\" feel. We settled on Haiku for now, with a secondary more detailed translation available on demand (which uses Claude 3 Opus in the background).\n\nEach instant help call costs about $0.002 (input + output tokens). With thousands of users, that adds up. We implemented a **local cache** keyed on (book_id, para_index, phrase, target_lang) using Redis. Repeated requests for the same phrase (e.g., multiple users reading the same book) are served from cache instantly, reducing LLM calls by ~30% in our beta.\n\nExperimentally, more context (2 paragraphs) significantly improved quality without adding too many tokens. But including an entire chapter led to slower responses and occasional off-topic interpretations. We keep the context at ~500 tokens on average.\n\n`Output format: **Translation:** ... **Note:** ...`\n\nreduced malformed responses by 90%. Small tweaks matter.We're exploring a **context window expansion** that uses the entire chapter, but with aggressive summarization of preceding paragraphs via a cheap model call. Also, fine-tuning a small open-source model on our translation style could bring costs close to zero. If you've built similar inline AI features, how did you handle the cost/latency/quality triangle? We'd love to hear your approach in the comments.\n\n*Building LectuLibre has taught us that AI-powered tools shine when they fit seamlessly into the user's workflow. Instant translation help is that seam—a small feature that feels like magic because it respects the reader's flow.*", "url": "https://wpnews.pro/news/building-instant-translation-assistance-for-book-translations-with-python-and", "canonical_source": "https://dev.to/jacob_gong/building-instant-translation-assistance-for-book-translations-with-python-and-llms-1o3c", "published_at": "2026-07-04 03:01:48+00:00", "updated_at": "2026-07-04 03:49:05.678057+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-products", "developer-tools"], "entities": ["LectuLibre", "Claude 3 Haiku", "FastAPI", "PostgreSQL", "React"], "alternates": {"html": "https://wpnews.pro/news/building-instant-translation-assistance-for-book-translations-with-python-and", "markdown": "https://wpnews.pro/news/building-instant-translation-assistance-for-book-translations-with-python-and.md", "text": "https://wpnews.pro/news/building-instant-translation-assistance-for-book-translations-with-python-and.txt", "jsonld": "https://wpnews.pro/news/building-instant-translation-assistance-for-book-translations-with-python-and.jsonld"}}