{"slug": "llm-for-text-summarization-best-practices-and-optimization-techniques", "title": "LLM for Text Summarization: Best Practices and Optimization Techniques", "summary": "A developer built a production-ready document summarizer using Oxlo.ai's API, which ingests long-form text and outputs structured JSON with a TL;DR, key points, and action items. The pipeline leverages models like DeepSeek V3.2, Llama 3.3 70B, Kimi K2.6, and Qwen 3 32B, and takes advantage of Oxlo.ai's flat per-request pricing to handle large documents without chunking. The system includes a refinement step to simplify jargon for non-expert readers.", "body_md": "We are going to build a production-ready document summarizer that ingests long-form text and emits structured JSON with a TL;DR, key points, and action items. If you process research papers, support tickets, or meeting transcripts, this gives you a reusable pipeline you can drop into any backend.\n\n`pip install openai`\n\nI always start by confirming the API contract works. This snippet initializes the Oxlo.ai client and sends a one-sentence summary request to DeepSeek V3.2, which is available on the free tier. If you see a response, your environment is ready.\n\n``` python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"https://api.oxlo.ai/v1\", api_key=\"YOUR_OXLO_API_KEY\")\n\nresponse = client.chat.completions.create(\n    model=\"deepseek-v3.2\",\n    messages=[\n        {\"role\": \"user\", \"content\": \"Summarize this in one sentence: The quick brown fox jumps over the lazy dog.\"},\n    ],\n)\n\nprint(response.choices[0].message.content)\n```\n\nThe system prompt is the only part of the stack that shapes tone and structure, so I keep it in a dedicated constant. I instruct the model to behave like a research analyst and emit only valid JSON.\n\n```\nSYSTEM_PROMPT = \"\"\"You are a precise document summarizer. Read the user's text and produce a JSON object with exactly these keys:\n- title: a short, descriptive title\n- tldr: a one-sentence summary under 20 words\n- key_points: an array of 3 to 5 bullet strings\n- action_items: an array of specific next steps, or an empty array if none exist\n\nRules:\n- Output only the JSON object, with no markdown fences and no preamble.\n- Base every field strictly on the provided text.\n- Be concise. Avoid filler words.\"\"\"\n```\n\nNext, I wrap the prompt in a reusable function that calls Oxlo.ai. I use Llama 3.3 70B here because it follows system instructions reliably for structured extraction.\n\n``` python\nimport json\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"https://api.oxlo.ai/v1\", api_key=\"YOUR_OXLO_API_KEY\")\n\ndef summarize(text: str) -> dict:\n    response = client.chat.completions.create(\n        model=\"llama-3.3-70b\",\n        messages=[\n            {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n            {\"role\": \"user\", \"content\": text},\n        ],\n        temperature=0.2,\n    )\n    \n    raw = response.choices[0].message.content.strip()\n    if raw.startswith(\"\n\n```\"):\n        raw = raw.split(\"\\n\", 1)[1].rsplit(\"```\n\n\", 1)[0].strip()\n    return json.loads(raw)\n```\n\nMost token-based providers make long inputs expensive, but Oxlo.ai uses flat per-request pricing regardless of prompt length, so a 50,000-character annual report costs the same as a single sentence. See [https://oxlo.ai/pricing](https://oxlo.ai/pricing) for details. For this step I switch to Kimi K2.6, which supports a 131K context window, so I can drop the entire document into one request without chunking logic.\n\n``` php\ndef summarize_long(text: str) -> dict:\n    response = client.chat.completions.create(\n        model=\"kimi-k2.6\",\n        messages=[\n            {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n            {\"role\": \"user\", \"content\": text},\n        ],\n        temperature=0.2,\n    )\n    \n    raw = response.choices[0].message.content.strip()\n    if raw.startswith(\"\n\n```\"):\n        raw = raw.split(\"\\n\", 1)[1].rsplit(\"```\n\n\", 1)[0].strip()\n    return json.loads(raw)\n```\n\nWhen the input is dense with jargon, I run a second pass to simplify language while preserving meaning. I chain two calls: the first extracts the raw summary, and the second rewrites the tldr and key_points for non-expert readers. I use Qwen 3 32B for the rewrite because it handles technical rephrasing precisely.\n\n```\nREFINE_PROMPT = \"\"\"You are an editor. Take the JSON summary below and rewrite only the 'tldr' and 'key_points' fields so a non-expert can understand them. Keep the 'title' and 'action_items' exactly as they are. Output only valid JSON.\"\"\"\n\ndef summarize_and_refine(text: str) -> dict:\n    first = summarize_long(text)\n    \n    response = client.chat.completions.create(\n        model=\"qwen-3-32b\",\n        messages=[\n            {\"role\": \"system\", \"content\": REFINE_PROMPT},\n            {\"role\": \"user\", \"content\": json.dumps(first, indent=2)},\n        ],\n        temperature=0.3,\n    )\n    \n    raw = response.choices[0].message.content.strip()\n    if raw.startswith(\"\n\n```\"):\n        raw = raw.split(\"\\n\", 1)[1].rsplit(\"```\n\n\", 1)[0].strip()\n    return json.loads(raw)\n```\n\nHere is the complete script. I feed it a sample quarterly earnings excerpt and print the refined JSON.\n\n``` python\nimport json\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"https://api.oxlo.ai/v1\", api_key=\"YOUR_OXLO_API_KEY\")\n\nSYSTEM_PROMPT = \"\"\"You are a precise document summarizer. Read the user's text and produce a JSON object with exactly these keys:\n- title: a short, descriptive title\n- tldr: a one-sentence summary under 20 words\n- key_points: an array of 3 to 5 bullet strings\n- action_items: an array of specific next steps, or an empty array if none exist\n\nRules:\n- Output only the JSON object, with no markdown fences and no preamble.\n- Base every field strictly on the provided text.\n- Be concise. Avoid filler words.\"\"\"\n\nREFINE_PROMPT = \"\"\"You are an editor. Take the JSON summary below and rewrite only the 'tldr' and 'key_points' fields so a non-expert can understand them. Keep the 'title' and 'action_items' exactly as they are. Output only valid JSON.\"\"\"\n\ndef summarize_long(text: str) -> dict:\n    response = client.chat.completions.create(\n        model=\"kimi-k2.6\",\n        messages=[\n            {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n            {\"role\": \"user\", \"content\": text},\n        ],\n        temperature=0.2,\n    )\n    raw = response.choices[0].message.content.strip()\n    if raw.startswith(\"\n\n```\"):\n        raw = raw.split(\"\\n\", 1)[1].rsplit(\"```\n\n\", 1)[0].strip()\n    return json.loads(raw)\n\ndef summarize_and_refine(text: str) -> dict:\n    first = summarize_long(text)\n    response = client.chat.completions.create(\n        model=\"qwen-3-32b\",\n        messages=[\n            {\"role\": \"system\", \"content\": REFINE_PROMPT},\n            {\"role\": \"user\", \"content\": json.dumps(first, indent=2)},\n        ],\n        temperature=0.3,\n    )\n    raw = response.choices[0].message.content.strip()\n    if raw.startswith(\"\n\n```\"):\n        raw = raw.split(\"\\n\", 1)[1].rsplit(\"```\n\n\", 1)[0].strip()\n    return json.loads(raw)\n\nif __name__ == \"__main__\":\n    document = \"\"\"\n    Q3 2024 Earnings Highlights\n\n    Revenue grew 12% year-over-year to $840M, driven primarily by cloud infrastructure adoption in APAC and expansion of the enterprise tier. Operating margin compressed to 18% from 22% last quarter due to increased headcount in R&D and a one-time restructuring charge of $14M. The board approved a $200M share buyback program to be executed over the next twelve months. CFO guidance for Q4 projects revenue between $855M and $875M, with margin recovery to 20% as the restructuring costs roll off. The company also announced a strategic partnership with a major semiconductor vendor to co-design AI accelerators for edge deployments, with first silicon expected in late 2025.\n    \"\"\"\n    \n    result = summarize_and_refine(document)\n    print(json.dumps(result, indent=2))\n```\n\nExample output:\n\n```\n{\n  \"title\": \"Q3 2024 Earnings and Q4 Outlook\",\n  \"tldr\": \"Revenue rose 12 percent to 840 million dollars, but profit margins dropped because of hiring and restructuring costs.\",\n  \"key_points\": [\n    \"Cloud infrastructure sales in Asia Pacific pushed revenue up 12 percent year over year\",\n    \"Operating margin fell to 18 percent from 22 percent due to research hiring and a 14 million dollar restructuring charge\",\n    \"The board authorized a 200 million dollar stock buyback over the next year\",\n    \"Fourth quarter revenue is expected to reach 855 to 875 million dollars with margins rebounding to 20 percent\",\n    \"A new chip partnership targets edge AI hardware arriving in late 2025\"\n  ],\n  \"action_items\": [\n    \"Monitor Q4 margin recovery toward the 20 percent target\",\n    \"Track progress on the semiconductor partnership and 2025 silicon timeline\",\n    \"Evaluate impact of the share buyback on capital allocation\"\n  ]\n}\n```\n\nTwo concrete ways to productionize this. First, wrap the summarizer in a FastAPI endpoint and accept file uploads so other services can POST PDFs or raw text. Second, enable streaming by setting `stream=True`\n\non the Oxlo.ai client and yield JSON chunks as they arrive, which keeps latency low for interactive UIs.", "url": "https://wpnews.pro/news/llm-for-text-summarization-best-practices-and-optimization-techniques", "canonical_source": "https://dev.to/shashank_ms_6a35baa4be138/llm-for-text-summarization-best-practices-and-optimization-techniques-49mo", "published_at": "2026-06-17 15:39:17+00:00", "updated_at": "2026-06-17 15:51:37.702334+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-tools", "developer-tools"], "entities": ["Oxlo.ai", "DeepSeek V3.2", "Llama 3.3 70B", "Kimi K2.6", "Qwen 3 32B"], "alternates": {"html": "https://wpnews.pro/news/llm-for-text-summarization-best-practices-and-optimization-techniques", "markdown": "https://wpnews.pro/news/llm-for-text-summarization-best-practices-and-optimization-techniques.md", "text": "https://wpnews.pro/news/llm-for-text-summarization-best-practices-and-optimization-techniques.txt", "jsonld": "https://wpnews.pro/news/llm-for-text-summarization-best-practices-and-optimization-techniques.jsonld"}}