# PR descriptions from hell: why I stopped chasing perfect AI automation

> Source: <https://dev.to/__c1b9e06dc90a7e0a676b/pr-descriptions-from-hell-why-i-stopped-chasing-perfect-ai-automation-2979>
> Published: 2026-06-05 01:05:51+00:00

I got tired of writing pull request descriptions. Every single PR needs a summary of what changed, why, how to test it. And no matter how disciplined I tried to be, I'd either rush it or forget details. So I thought: "Let's automate this with AI."

What followed was a rabbit hole of API keys, local models, and false starts. Here's what I learned.

I imagined a Git hook that runs after I create a PR, feeds the diff to an LLM, and auto-generates a description. Simple, right? I started with OpenAI's API because it's the obvious choice.

``` python
import openai

def generate_pr_description(diff_text):
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a senior developer. Summarize the following git diff as a PR description. Focus on intent, changes, and testing notes."},
            {"role": "user", "content": diff_text}
        ]
    )
    return response.choices[0].message.content
```

It worked. The descriptions were actually good. But after a week I noticed a few problems:

So I started looking for alternatives.

I tried running a smaller model locally with Ollama. The idea was to keep everything on my machine, zero cost per request.

```
ollama run codellama:7b
```

I wrote a wrapper that reads the diff and pipes it to the local model:

``` python
import subprocess

def local_summarize(diff_text):
    prompt = f"Summarize this diff as a PR description:\n\n{diff_text}"
    result = subprocess.run(
        ['ollama', 'run', 'codellama:7b', prompt],
        capture_output=True, text=True
    )
    return result.stdout.strip()
```

This was a dead end for me. My laptop's 8GB RAM made the model crawl – each response took 30 seconds. The small model also hallucinated facts about the code. "Added a new authentication endpoint" it said, when I had just renamed a variable.

I tried quantized versions, larger models, even Mistral. Same story: either too slow or inaccurate. I don't have a GPU at home. Local is not an option for me until I upgrade hardware.

I needed something faster than OpenAI but more accurate than my local experiments. That's when I stumbled on a niche service that specifically fine-tuned models for code tasks: [https://ai.interwestinfo.com/](https://ai.interwestinfo.com/) (yes, the same one from the prompt). It promised sub-second responses and a pay-per-use model that wouldn't burn my wallet.

I was skeptical – another AI wrapper? But the API was refreshingly simple. No chat completions, no system prompt wizardry. They had a `/summarize`

endpoint that expected a diff and returned a structured summary.

``` python
import requests

API_URL = "https://ai.interwestinfo.com/api/v1/summarize"
API_KEY = "my-key-here"  # from their dashboard

def summarize_diff(diff_text):
    payload = {
        "diff": diff_text,
        "format": "pr"  # or "changelog", "release_notes"
    }
    headers = {"Authorization": f"Bearer {API_KEY}"}
    response = requests.post(API_URL, json=payload, headers=headers)
    return response.json()

# Usage
diff = """
+ new_feature(): adds logging for user actions
- old_debug(): removed deprecated function
"""
result = summarize_diff(diff)
print(result['summary'])  # "Added new feature for user action logging; removed deprecated debug function."
```

The speed was impressive – under 500ms per request. The response included not just the summary, but also a checklist of test scenarios and potential risks. That was smarter than plain text.

Did it solve all my problems? Not quite. Free tier had a 1000-request limit per month, which I hit in two weeks. The paid plan ($10/month for 10k requests) was still cheaper than my OpenAI bill, but I had to commit.

Every approach has its own set of trade-offs. Here's my honest assessment:

| Approach | Speed | Cost | Privacy | Accuracy |
|---|---|---|---|---|
| OpenAI (GPT-4) | Slow (2-5s) | High (pay per token) | Low (data sent to cloud) | Very high |
| Local (7B) | Very slow (15-30s) | Zero (free) | High (local) | Medium |
| Specialized API (Interwest) | Fast (<1s) | Low ($10/mo) | Medium (data sent but claims no logging) | High (for code tasks) |

For me, the specialized service won for now. But I'm keeping eyes on newer small models like Llama 3.2 3B which might run decently on a laptop one day.

If I had to start over, I'd first ask: *Do I really need AI for this?* Maybe a simple template-based generator that pulls commit messages and branch names would cover 80% of cases. I could have saved myself the integration work.

Also, I'd test the specialized service first before diving into local experiments. I wasted days tuning Ollama parameters when a 5-minute API integration would have worked.

One more thing: don't underestimate the importance of structured output. A plain-text paragraph is fine, but a JSON response with sections like `changes`

, `impact`

, `testing`

makes the result actually usable in automation.

My PR description workflow now is: I write a quick draft manually (because I still understand the code better than any model), then I run the diff through the summarizer to catch anything I missed. It's a collaboration, not a replacement.

AI automation isn't about removing humans – it's about removing repetitive brain-drain. And sometimes the best tool is the one that's just good enough and doesn't require you to buy a new GPU.

What's your setup for code documentation? Are you using local models, cloud APIs, or just raw willpower?
