{"slug": "fine-tuning-qwen2-5-0-5b-to-write-sre-post-mortem-summaries", "title": "Fine-Tuning Qwen2.5-0.5B to Write SRE Post-Mortem Summaries", "summary": "A developer fine-tuned the Qwen2.5-0.5B model on 700 real incident post-mortem examples, achieving over 60% rubric compliance for structured SRE summaries—outperforming zero-shot baselines from larger models like Qwen3.6-plus (20-35%) and GPT-5.4-nano (35-50%). The LoRA adapter, published on HuggingFace, runs on CPU or consumer GPU, producing fast, cheap, and format-consistent outputs that follow an organization's SRE conventions. The entire pipeline, including data scraping, baseline evaluation, fine-tuning, and rubric-based scoring, was autonomously built by the AI engineering agent NEO without human intervention.", "body_md": "Writing post-mortem root-cause summaries is time-consuming and inconsistent. Junior SREs miss contributing factors. Senior SREs write summaries that vary in depth and structure. Zero-shot LLMs produce verbose, generic output that does not follow SRE conventions.\n\nFine-tuning a small model on real incident data produces structured, concise summaries that follow your organisation's format at a fraction of the cost of a large model.\n\nDiffrent type of approaches and what you get:\n\nManual SRE writing : Inconsistent, time-consuming, expertise-dependent\n\nZero-shot large model : Generic format, verbose, high cost per call\n\nQwen2.5-0.5B fine-tuned : SRE-format outputs, fast, cheap, runs on CPU or consumer GPU\n\nThe key advantages of this approach:\n\n`qwen3.6-plus:free`\n\nand `gpt-5.4-nano`\n\nbaselinesThe fine-tuned adapter is published at: `daksh-neo/postmortem-qwen2.5-0.5b-lora`\n\nAfter training, the LoRA weights are saved to `models/postmortem-lora/hf_export/`\n\nand pushed to HuggingFace.\n\n```\npython3 -m venv venv\nsource venv/bin/activate\npip install -r requirements.txt\ncp .env.example .env   # fill in OPENROUTER_API_KEY\nexport $(cat .env | xargs)\n```\n\n**Environment Variables**\n\n```\ncp .env.example .env\n# Required for baseline evaluation with OpenRouter\nOPENROUTER_API_KEY=your_openrouter_api_key_here\n```\n\n`OPENROUTER_API_KEY`\n\nis required only for running baseline evaluations against zero-shot models via OpenRouter. The fine-tuning and local evaluation steps run without it.\n\nThe full pipeline runs in four steps:\n\nEach step is independent, you can run baseline evaluation before fine-tuning to establish the gap the fine-tuned model closes, and run evaluation again after to measure the improvement.\n\n**Evaluation Rubric**\n\nEvery generated summary is scored against a four-criterion rubric. Each criterion carries equal weight:\n\nPass threshold: 0.60 weighted score or above.\n\n`qwen/qwen3.6-plus:free`\n\n(zero-shot) - 20–35%\n\n`openai/gpt-5.4-nano`\n\n(zero-shot) - 35–50%\n\nQwen2.5-0.5B (fine-tuned, 3 epochs) - > 60%\n\nThe fine-tuned 0.5B model outperforms both zero-shot baselines on rubric compliance because it has been trained specifically on the output format the rubric measures, not on general-purpose tasks.\n\n```\nml_project_0901/\n├── scrape_postmortems.py    # Data collection\n├── baseline.py              # Zero-shot baseline via OpenRouter\n├── finetune.py              # LoRA fine-tuning\n├── eval.py                  # Evaluation + comparison\n├── requirements.txt\n├── .env.example\n├── .gitignore\n├── LICENSE\n├── CONTRIBUTING.md\n├── architecture.excalidraw\n├── infographic.svg\n├── data/\n│   ├── train.jsonl          # 700 training examples\n│   ├── test_100.jsonl       # 100 held-out test examples\n│   ├── rubric.json          # Scoring rubric\n│   └── baseline_results.jsonl\n└── models/\n    └── postmortem-lora/\n        └── hf_export/       # Push to HuggingFace after training\n```\n\nThis project was built using NEO. [NEO](https://heyneo.com/) is a fully autonomous AI engineering agent that can write code and build solutions for AI/ML tasks including AI model evals, prompt optimization and end to end AI pipeline development.\n\nThe requirement was a complete fine-tuning pipeline for a small model on SRE post-mortem data, with data scraping, zero-shot baseline comparison, 4-bit LoRA fine-tuning, and structured rubric-based evaluation. NEO planned, wrote, tested, and verified every file in the repository without human intervention: the data scraper producing 700 training examples and 100 held-out test examples, the baseline evaluator running zero-shot prompts against OpenRouter models, the LoRA fine-tuning script with the full model configuration, the rubric-based evaluator producing the comparison table, and the HuggingFace export pipeline pushing the trained adapter to `daksh-neo/postmortem-qwen2.5-0.5b-lora`\n\n.\n\n**Use it to replace inconsistent manual post-mortem writing in your team.**\n\nTrain on your own organisation's incident data by replacing `data/train.jsonl`\n\nwith your own incident timeline to root-cause summary pairs. The rubric in `data/rubric.json`\n\ncan be adapted to match your org's specific post-mortem format the evaluation pipeline measures compliance against whatever criteria you define.\n\n**Use the baseline comparison to justify the fine-tuning investment.**\n\nRun `python baseline.py`\n\nbefore fine-tuning to measure what zero-shot models produce on your data. Run `python eval.py`\n\nafter fine-tuning to see the improvement. The comparison table gives you a concrete before-and-after that makes the case for domain-specific fine-tuning over general-purpose models.\n\n**Use the published adapter directly without retraining.**\n\nThe fine-tuned LoRA adapter is available at daksh-neo/postmortem-qwen2.5-0.5b-lora on HuggingFace. You can load it directly without running the training pipeline - useful for teams that want to evaluate the output before committing to their own fine-tuning run.\n\n**Extend it to other structured generation tasks.**\n\nThe four-step pipeline - scrape, baseline, fine-tune, evaluate is domain-agnostic. Any task where structured output format matters more than general knowledge is a candidate: alert triage summaries, change request descriptions, deployment notes. Swap the training data and rubric criteria, and the rest of the pipeline runs unchanged.\n\nZero-shot large models produce verbose, generic post-mortem summaries that do not follow SRE conventions. A fine-tuned 0.5B model trained on 700 domain-specific examples outperforms them on every criterion of the rubric - timeline reference, contributing factors, specific component identification, and concrete prevention actions, while running on a consumer GPU and costing a fraction per call.\n\nThe code is at [https://github.com/dakshjain-1616/postmortem-finetune](https://github.com/dakshjain-1616/postmortem-finetune)\n\nYou can also build with NEO in your IDE using the [VS Code extension](https://marketplace.visualstudio.com/items?itemName=NeoResearchInc.heyneo) or [Cursor](https://open-vsx.org/extension/NeoResearchInc/heyneo).\n\nYou can use NEO MCP with Claude Code: [https://heyneo.com/claude-code](https://heyneo.com/claude-code)", "url": "https://wpnews.pro/news/fine-tuning-qwen2-5-0-5b-to-write-sre-post-mortem-summaries", "canonical_source": "https://dev.to/nilofer_tweets/fine-tuning-qwen25-05b-to-write-sre-post-mortem-summaries-2jem", "published_at": "2026-05-30 04:43:37+00:00", "updated_at": "2026-05-30 05:11:43.268765+00:00", "lang": "en", "topics": ["machine-learning", "large-language-models", "artificial-intelligence", "ai-tools", "mlops"], "entities": ["Qwen2.5-0.5B", "SRE", "LoRA", "HuggingFace", "OpenRouter", "daksh-neo"], "alternates": {"html": "https://wpnews.pro/news/fine-tuning-qwen2-5-0-5b-to-write-sre-post-mortem-summaries", "markdown": "https://wpnews.pro/news/fine-tuning-qwen2-5-0-5b-to-write-sre-post-mortem-summaries.md", "text": "https://wpnews.pro/news/fine-tuning-qwen2-5-0-5b-to-write-sre-post-mortem-summaries.txt", "jsonld": "https://wpnews.pro/news/fine-tuning-qwen2-5-0-5b-to-write-sre-post-mortem-summaries.jsonld"}}