I used LLMs to rewrite meta descriptions for 1,600 articles — honest results

A content manager used an LLM to rewrite or create meta descriptions for 640 cybersecurity articles, finding that 40% had no description and 30% were poorly written. After iterating on a strict prompt with character limits and validation loops, the project achieved a 0.8 percentage point increase in click-through rate (CTR) over six weeks, though 4% of outputs required manual review. The author notes that while the improvement is modest, it is a free, scalable gain that depends on having strong source content for the LLM to work with.

Meta descriptions are the most underrated SEO element on content-heavy sites. They don't affect rankings directly, but they determine whether someone clicks your result in Google. A bad meta description on a well-ranked article is traffic you're leaving on the table. I had 1,600+ cybersecurity articles. About 40% had no meta description at all. Another 30% had descriptions that were either truncated, keyword-stuffed, or copy-pasted from the first paragraph which almost never makes a good description . So I automated the rewrite. Here's what actually happened. The constraint: 140-160 characters, every time The rule is simple and brutal: meta descriptions must be 140-160 characters. Not words — characters. Including spaces. Under 140: Google often ignores your description and rewrites it automatically usually badly . Over 160: truncated with "…" in search results, which kills CTR. This is harder than it sounds when you're generating text with an LLM. The model has no natural understanding of character counts — it optimizes for coherence, not length. My first naive prompt: Write a meta description for this article about {topic}. Keep it under 160 characters. Results: descriptions ranging from 95 to 210 characters. Useless. The prompt engineering that actually worked After a lot of iteration, the prompt that consistently landed in the 140-160 range: Write a meta description for this cybersecurity article. Rules: - EXACTLY 140 to 160 characters count carefully, including spaces - Start with an action verb or a direct hook - Include the main topic and one concrete benefit - No buzzwords comprehensive, ultimate, complete - No "In this article" or "This guide" Article title: {title} Article excerpt: {excerpt} Main keywords: {keywords} Output only the description, nothing else. The key changes: - "EXACTLY" instead of "under" — models respect hard constraints better than soft ones - Positive framing of what to include, not just what to avoid - Strip all meta-commentary — "Output only the description" eliminates the model explaining what it did Even with this prompt, I got out-of-range results ~15% of the time. So I added a validation + retry loop. The validation pipeline php import re def validate meta description desc: str - dict: length = len desc issues = if length < 140: issues.append f"Too short: {length} chars min 140 " if length 160: issues.append f"Too long: {length} chars max 160 " if desc.startswith "In this", "This article", "This guide" : issues.append "Starts with forbidden phrase" if re.search r'\b comprehensive|ultimate|complete \b', desc, re.I : issues.append "Contains buzzword" return { "valid": len issues == 0, "length": length, "issues": issues, } def generate meta description title: str, excerpt: str, keywords: list, max retries: int = 3 - str: for attempt in range max retries : desc = call llm build prompt title, excerpt, keywords result = validate meta description desc if result "valid" : return desc Retry with explicit correction hint if attempt < max retries - 1: hint = f"Previous attempt failed: {', '.join result 'issues' }. Try again." inject hint into next prompt return None manual review needed After 3 retries, I flagged remaining failures for manual review. About 4% needed human intervention. The results: honest numbers I ran this across 640 articles the ones with missing or clearly bad descriptions first . Quality assessment I manually reviewed a random sample of 80 : - 71% — better than what I had before - 22% — similar quality - 7% — worse usually missing context that wasn't in the excerpt The 7% worse cases had a common pattern: articles where the excerpt was weak or missing. The model had nothing to work with. This is the content problem again — LLMs can't fix bad source material. What I measured in search console I waited 6 weeks after the bulk update before looking at data Google needs time to recrawl and the signal needs to stabilize . Results on the articles that were updated vs. a control group that wasn't: - CTR: +0.8 percentage points average statistically significant at this scale - Impressions: unchanged as expected — meta descriptions don't affect rankings - Position: unchanged also expected 0.8pp CTR improvement across 640 articles with meaningful traffic adds up. It's not a dramatic transformation — anyone promising dramatic results from meta description optimization is lying to you. But it's real and it's free once the pipeline is built. The unexpected failure: duplicate descriptions One thing I didn't anticipate: the model started producing structurally similar descriptions across articles in the same category. When I had 50 guides about Active Directory security, many descriptions ended up following the same pattern: "Learn how to verb AD concept to protect your environment from threat . Step-by-step guide with tool ." Technically valid. Practically, if someone searches and sees 5 results from the same site with near-identical descriptions, they'll click none of them. Fix: I added a deduplication check that compares new descriptions against already-generated ones using simple n-gram similarity. If similarity 0.7, force a regeneration with an explicit instruction to use a different structure. Things I'd do differently 1. Fix excerpts before running LLM generation The quality of the generated description is directly proportional to the quality of the excerpt. I should have audited and fixed all excerpts first. I did it in the wrong order. 2. Category-specific prompts A prompt for a "news" article should be different from a "guide" or "checklist" article. News descriptions need urgency; guides need the benefit; checklists need the scope. I used one prompt for everything and paid for it in quality. 3. Track CTR per article, not just aggregate I know the average improved, but I don't know which specific articles drove the improvement. Better instrumentation would let me learn which description styles work for which query intents. The actual takeaway LLMs are genuinely useful for this kind of bulk text generation task if you: - Write a tight prompt with hard constraints - Build validation + retry logic don't trust the model to self-validate - Have decent source material to work from - Measure the actual downstream metric CTR , not a proxy They're not useful if you expect them to compensate for bad content strategy. Garbage in, slightly better-formatted garbage out. I run AYI NEDJIMI Consultants, a cybersecurity consulting firm. Content covers pentesting, Active Directory, cloud security and compliance. 17 free hardening checklists available PDF + Excel — FortiGate, Palo Alto, pfSense, Active Directory and more.