# Fine-Tuning Qwen2.5-0.5B to Write SRE Post-Mortem Summaries

> Source: <https://dev.to/nilofer_tweets/fine-tuning-qwen25-05b-to-write-sre-post-mortem-summaries-2jem>
> Published: 2026-05-30 04:43:37+00:00

Writing post-mortem root-cause summaries is time-consuming and inconsistent. Junior SREs miss contributing factors. Senior SREs write summaries that vary in depth and structure. Zero-shot LLMs produce verbose, generic output that does not follow SRE conventions.

Fine-tuning a small model on real incident data produces structured, concise summaries that follow your organisation's format at a fraction of the cost of a large model.

Diffrent type of approaches and what you get:

Manual SRE writing : Inconsistent, time-consuming, expertise-dependent

Zero-shot large model : Generic format, verbose, high cost per call

Qwen2.5-0.5B fine-tuned : SRE-format outputs, fast, cheap, runs on CPU or consumer GPU

The key advantages of this approach:

`qwen3.6-plus:free`

and `gpt-5.4-nano`

baselinesThe fine-tuned adapter is published at: `daksh-neo/postmortem-qwen2.5-0.5b-lora`

After training, the LoRA weights are saved to `models/postmortem-lora/hf_export/`

and pushed to HuggingFace.

```
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env   # fill in OPENROUTER_API_KEY
export $(cat .env | xargs)
```

**Environment Variables**

```
cp .env.example .env
# Required for baseline evaluation with OpenRouter
OPENROUTER_API_KEY=your_openrouter_api_key_here
```

`OPENROUTER_API_KEY`

is required only for running baseline evaluations against zero-shot models via OpenRouter. The fine-tuning and local evaluation steps run without it.

The full pipeline runs in four steps:

Each step is independent, you can run baseline evaluation before fine-tuning to establish the gap the fine-tuned model closes, and run evaluation again after to measure the improvement.

**Evaluation Rubric**

Every generated summary is scored against a four-criterion rubric. Each criterion carries equal weight:

Pass threshold: 0.60 weighted score or above.

`qwen/qwen3.6-plus:free`

(zero-shot) - 20–35%

`openai/gpt-5.4-nano`

(zero-shot) - 35–50%

Qwen2.5-0.5B (fine-tuned, 3 epochs) - > 60%

The fine-tuned 0.5B model outperforms both zero-shot baselines on rubric compliance because it has been trained specifically on the output format the rubric measures, not on general-purpose tasks.

```
ml_project_0901/
├── scrape_postmortems.py    # Data collection
├── baseline.py              # Zero-shot baseline via OpenRouter
├── finetune.py              # LoRA fine-tuning
├── eval.py                  # Evaluation + comparison
├── requirements.txt
├── .env.example
├── .gitignore
├── LICENSE
├── CONTRIBUTING.md
├── architecture.excalidraw
├── infographic.svg
├── data/
│   ├── train.jsonl          # 700 training examples
│   ├── test_100.jsonl       # 100 held-out test examples
│   ├── rubric.json          # Scoring rubric
│   └── baseline_results.jsonl
└── models/
    └── postmortem-lora/
        └── hf_export/       # Push to HuggingFace after training
```

This project was built using NEO. [NEO](https://heyneo.com/) is a fully autonomous AI engineering agent that can write code and build solutions for AI/ML tasks including AI model evals, prompt optimization and end to end AI pipeline development.

The requirement was a complete fine-tuning pipeline for a small model on SRE post-mortem data, with data scraping, zero-shot baseline comparison, 4-bit LoRA fine-tuning, and structured rubric-based evaluation. NEO planned, wrote, tested, and verified every file in the repository without human intervention: the data scraper producing 700 training examples and 100 held-out test examples, the baseline evaluator running zero-shot prompts against OpenRouter models, the LoRA fine-tuning script with the full model configuration, the rubric-based evaluator producing the comparison table, and the HuggingFace export pipeline pushing the trained adapter to `daksh-neo/postmortem-qwen2.5-0.5b-lora`

.

**Use it to replace inconsistent manual post-mortem writing in your team.**

Train on your own organisation's incident data by replacing `data/train.jsonl`

with your own incident timeline to root-cause summary pairs. The rubric in `data/rubric.json`

can be adapted to match your org's specific post-mortem format the evaluation pipeline measures compliance against whatever criteria you define.

**Use the baseline comparison to justify the fine-tuning investment.**

Run `python baseline.py`

before fine-tuning to measure what zero-shot models produce on your data. Run `python eval.py`

after fine-tuning to see the improvement. The comparison table gives you a concrete before-and-after that makes the case for domain-specific fine-tuning over general-purpose models.

**Use the published adapter directly without retraining.**

The fine-tuned LoRA adapter is available at daksh-neo/postmortem-qwen2.5-0.5b-lora on HuggingFace. You can load it directly without running the training pipeline - useful for teams that want to evaluate the output before committing to their own fine-tuning run.

**Extend it to other structured generation tasks.**

The four-step pipeline - scrape, baseline, fine-tune, evaluate is domain-agnostic. Any task where structured output format matters more than general knowledge is a candidate: alert triage summaries, change request descriptions, deployment notes. Swap the training data and rubric criteria, and the rest of the pipeline runs unchanged.

Zero-shot large models produce verbose, generic post-mortem summaries that do not follow SRE conventions. A fine-tuned 0.5B model trained on 700 domain-specific examples outperforms them on every criterion of the rubric - timeline reference, contributing factors, specific component identification, and concrete prevention actions, while running on a consumer GPU and costing a fraction per call.

The code is at [https://github.com/dakshjain-1616/postmortem-finetune](https://github.com/dakshjain-1616/postmortem-finetune)

You can also build with NEO in your IDE using the [VS Code extension](https://marketplace.visualstudio.com/items?itemName=NeoResearchInc.heyneo) or [Cursor](https://open-vsx.org/extension/NeoResearchInc/heyneo).

You can use NEO MCP with Claude Code: [https://heyneo.com/claude-code](https://heyneo.com/claude-code)
