Fine-Tuning Qwen2.5-0.5B to Write SRE Post-Mortem Summaries

A developer fine-tuned the Qwen2.5-0.5B model on 700 real incident post-mortem examples, achieving over 60% rubric compliance for structured SRE summaries—outperforming zero-shot baselines from larger models like Qwen3.6-plus (20-35%) and GPT-5.4-nano (35-50%). The LoRA adapter, published on HuggingFace, runs on CPU or consumer GPU, producing fast, cheap, and format-consistent outputs that follow an organization's SRE conventions. The entire pipeline, including data scraping, baseline evaluation, fine-tuning, and rubric-based scoring, was autonomously built by the AI engineering agent NEO without human intervention.

Writing post-mortem root-cause summaries is time-consuming and inconsistent. Junior SREs miss contributing factors. Senior SREs write summaries that vary in depth and structure. Zero-shot LLMs produce verbose, generic output that does not follow SRE conventions. Fine-tuning a small model on real incident data produces structured, concise summaries that follow your organisation's format at a fraction of the cost of a large model. Diffrent type of approaches and what you get: Manual SRE writing : Inconsistent, time-consuming, expertise-dependent Zero-shot large model : Generic format, verbose, high cost per call Qwen2.5-0.5B fine-tuned : SRE-format outputs, fast, cheap, runs on CPU or consumer GPU The key advantages of this approach: qwen3.6-plus:free and gpt-5.4-nano baselinesThe fine-tuned adapter is published at: daksh-neo/postmortem-qwen2.5-0.5b-lora After training, the LoRA weights are saved to models/postmortem-lora/hf export/ and pushed to HuggingFace. python3 -m venv venv source venv/bin/activate pip install -r requirements.txt cp .env.example .env fill in OPENROUTER API KEY export $ cat .env | xargs Environment Variables cp .env.example .env Required for baseline evaluation with OpenRouter OPENROUTER API KEY=your openrouter api key here OPENROUTER API KEY is required only for running baseline evaluations against zero-shot models via OpenRouter. The fine-tuning and local evaluation steps run without it. The full pipeline runs in four steps: Each step is independent, you can run baseline evaluation before fine-tuning to establish the gap the fine-tuned model closes, and run evaluation again after to measure the improvement. Evaluation Rubric Every generated summary is scored against a four-criterion rubric. Each criterion carries equal weight: Pass threshold: 0.60 weighted score or above. qwen/qwen3.6-plus:free zero-shot - 20–35% openai/gpt-5.4-nano zero-shot - 35–50% Qwen2.5-0.5B fine-tuned, 3 epochs - 60% The fine-tuned 0.5B model outperforms both zero-shot baselines on rubric compliance because it has been trained specifically on the output format the rubric measures, not on general-purpose tasks. ml project 0901/ ├── scrape postmortems.py Data collection ├── baseline.py Zero-shot baseline via OpenRouter ├── finetune.py LoRA fine-tuning ├── eval.py Evaluation + comparison ├── requirements.txt ├── .env.example ├── .gitignore ├── LICENSE ├── CONTRIBUTING.md ├── architecture.excalidraw ├── infographic.svg ├── data/ │ ├── train.jsonl 700 training examples │ ├── test 100.jsonl 100 held-out test examples │ ├── rubric.json Scoring rubric │ └── baseline results.jsonl └── models/ └── postmortem-lora/ └── hf export/ Push to HuggingFace after training This project was built using NEO. NEO https://heyneo.com/ is a fully autonomous AI engineering agent that can write code and build solutions for AI/ML tasks including AI model evals, prompt optimization and end to end AI pipeline development. The requirement was a complete fine-tuning pipeline for a small model on SRE post-mortem data, with data scraping, zero-shot baseline comparison, 4-bit LoRA fine-tuning, and structured rubric-based evaluation. NEO planned, wrote, tested, and verified every file in the repository without human intervention: the data scraper producing 700 training examples and 100 held-out test examples, the baseline evaluator running zero-shot prompts against OpenRouter models, the LoRA fine-tuning script with the full model configuration, the rubric-based evaluator producing the comparison table, and the HuggingFace export pipeline pushing the trained adapter to daksh-neo/postmortem-qwen2.5-0.5b-lora . Use it to replace inconsistent manual post-mortem writing in your team. Train on your own organisation's incident data by replacing data/train.jsonl with your own incident timeline to root-cause summary pairs. The rubric in data/rubric.json can be adapted to match your org's specific post-mortem format the evaluation pipeline measures compliance against whatever criteria you define. Use the baseline comparison to justify the fine-tuning investment. Run python baseline.py before fine-tuning to measure what zero-shot models produce on your data. Run python eval.py after fine-tuning to see the improvement. The comparison table gives you a concrete before-and-after that makes the case for domain-specific fine-tuning over general-purpose models. Use the published adapter directly without retraining. The fine-tuned LoRA adapter is available at daksh-neo/postmortem-qwen2.5-0.5b-lora on HuggingFace. You can load it directly without running the training pipeline - useful for teams that want to evaluate the output before committing to their own fine-tuning run. Extend it to other structured generation tasks. The four-step pipeline - scrape, baseline, fine-tune, evaluate is domain-agnostic. Any task where structured output format matters more than general knowledge is a candidate: alert triage summaries, change request descriptions, deployment notes. Swap the training data and rubric criteria, and the rest of the pipeline runs unchanged. Zero-shot large models produce verbose, generic post-mortem summaries that do not follow SRE conventions. A fine-tuned 0.5B model trained on 700 domain-specific examples outperforms them on every criterion of the rubric - timeline reference, contributing factors, specific component identification, and concrete prevention actions, while running on a consumer GPU and costing a fraction per call. The code is at https://github.com/dakshjain-1616/postmortem-finetune https://github.com/dakshjain-1616/postmortem-finetune You can also build with NEO in your IDE using the VS Code extension https://marketplace.visualstudio.com/items?itemName=NeoResearchInc.heyneo or Cursor https://open-vsx.org/extension/NeoResearchInc/heyneo . You can use NEO MCP with Claude Code: https://heyneo.com/claude-code https://heyneo.com/claude-code