cd /news/machine-learning/fine-tuning-qwen2-5-0-5b-to-write-sr… · home topics machine-learning article
[ARTICLE · art-18354] src=dev.to pub= topic=machine-learning verified=true sentiment=↑ positive

Fine-Tuning Qwen2.5-0.5B to Write SRE Post-Mortem Summaries

A developer fine-tuned the Qwen2.5-0.5B model on 700 real incident post-mortem examples, achieving over 60% rubric compliance for structured SRE summaries—outperforming zero-shot baselines from larger models like Qwen3.6-plus (20-35%) and GPT-5.4-nano (35-50%). The LoRA adapter, published on HuggingFace, runs on CPU or consumer GPU, producing fast, cheap, and format-consistent outputs that follow an organization's SRE conventions. The entire pipeline, including data scraping, baseline evaluation, fine-tuning, and rubric-based scoring, was autonomously built by the AI engineering agent NEO without human intervention.

read4 min publishedMay 30, 2026

Writing post-mortem root-cause summaries is time-consuming and inconsistent. Junior SREs miss contributing factors. Senior SREs write summaries that vary in depth and structure. Zero-shot LLMs produce verbose, generic output that does not follow SRE conventions.

Fine-tuning a small model on real incident data produces structured, concise summaries that follow your organisation's format at a fraction of the cost of a large model.

Diffrent type of approaches and what you get:

Manual SRE writing : Inconsistent, time-consuming, expertise-dependent

Zero-shot large model : Generic format, verbose, high cost per call

Qwen2.5-0.5B fine-tuned : SRE-format outputs, fast, cheap, runs on CPU or consumer GPU

The key advantages of this approach:

qwen3.6-plus:free

and gpt-5.4-nano

baselinesThe fine-tuned adapter is published at: daksh-neo/postmortem-qwen2.5-0.5b-lora

After training, the LoRA weights are saved to models/postmortem-lora/hf_export/

and pushed to HuggingFace.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env   # fill in OPENROUTER_API_KEY
export $(cat .env | xargs)

Environment Variables

cp .env.example .env
OPENROUTER_API_KEY=your_openrouter_api_key_here

OPENROUTER_API_KEY

is required only for running baseline evaluations against zero-shot models via OpenRouter. The fine-tuning and local evaluation steps run without it.

The full pipeline runs in four steps:

Each step is independent, you can run baseline evaluation before fine-tuning to establish the gap the fine-tuned model closes, and run evaluation again after to measure the improvement.

Evaluation Rubric

Every generated summary is scored against a four-criterion rubric. Each criterion carries equal weight:

Pass threshold: 0.60 weighted score or above.

qwen/qwen3.6-plus:free

(zero-shot) - 20–35%

openai/gpt-5.4-nano

(zero-shot) - 35–50%

Qwen2.5-0.5B (fine-tuned, 3 epochs) - > 60%

The fine-tuned 0.5B model outperforms both zero-shot baselines on rubric compliance because it has been trained specifically on the output format the rubric measures, not on general-purpose tasks.

ml_project_0901/
├── scrape_postmortems.py    # Data collection
├── baseline.py              # Zero-shot baseline via OpenRouter
├── finetune.py              # LoRA fine-tuning
├── eval.py                  # Evaluation + comparison
├── requirements.txt
├── .env.example
├── .gitignore
├── LICENSE
├── CONTRIBUTING.md
├── architecture.excalidraw
├── infographic.svg
├── data/
│   ├── train.jsonl          # 700 training examples
│   ├── test_100.jsonl       # 100 held-out test examples
│   ├── rubric.json          # Scoring rubric
│   └── baseline_results.jsonl
└── models/
    └── postmortem-lora/
        └── hf_export/       # Push to HuggingFace after training

This project was built using NEO. NEO is a fully autonomous AI engineering agent that can write code and build solutions for AI/ML tasks including AI model evals, prompt optimization and end to end AI pipeline development.

The requirement was a complete fine-tuning pipeline for a small model on SRE post-mortem data, with data scraping, zero-shot baseline comparison, 4-bit LoRA fine-tuning, and structured rubric-based evaluation. NEO planned, wrote, tested, and verified every file in the repository without human intervention: the data scraper producing 700 training examples and 100 held-out test examples, the baseline evaluator running zero-shot prompts against OpenRouter models, the LoRA fine-tuning script with the full model configuration, the rubric-based evaluator producing the comparison table, and the HuggingFace export pipeline pushing the trained adapter to daksh-neo/postmortem-qwen2.5-0.5b-lora

.

Use it to replace inconsistent manual post-mortem writing in your team.

Train on your own organisation's incident data by replacing data/train.jsonl

with your own incident timeline to root-cause summary pairs. The rubric in data/rubric.json

can be adapted to match your org's specific post-mortem format the evaluation pipeline measures compliance against whatever criteria you define.

Use the baseline comparison to justify the fine-tuning investment.

Run python baseline.py

before fine-tuning to measure what zero-shot models produce on your data. Run python eval.py

after fine-tuning to see the improvement. The comparison table gives you a concrete before-and-after that makes the case for domain-specific fine-tuning over general-purpose models.

Use the published adapter directly without retraining.

The fine-tuned LoRA adapter is available at daksh-neo/postmortem-qwen2.5-0.5b-lora on HuggingFace. You can load it directly without running the training pipeline - useful for teams that want to evaluate the output before committing to their own fine-tuning run.

Extend it to other structured generation tasks.

The four-step pipeline - scrape, baseline, fine-tune, evaluate is domain-agnostic. Any task where structured output format matters more than general knowledge is a candidate: alert triage summaries, change request descriptions, deployment notes. Swap the training data and rubric criteria, and the rest of the pipeline runs unchanged.

Zero-shot large models produce verbose, generic post-mortem summaries that do not follow SRE conventions. A fine-tuned 0.5B model trained on 700 domain-specific examples outperforms them on every criterion of the rubric - timeline reference, contributing factors, specific component identification, and concrete prevention actions, while running on a consumer GPU and costing a fraction per call.

The code is at https://github.com/dakshjain-1616/postmortem-finetune

You can also build with NEO in your IDE using the VS Code extension or Cursor.

You can use NEO MCP with Claude Code: https://heyneo.com/claude-code

── more in #machine-learning 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/fine-tuning-qwen2-5-…] indexed:0 read:4min 2026-05-30 ·