{"slug": "post-mortem-why-my-blog-cover-images-silently-failed-to-restore-from-s3", "title": "Post-Mortem: Why My Blog Cover Images Silently Failed to Restore from S3", "summary": "Cover images on six blog posts remained broken despite having a restoration command. The failure was caused by two issues: a fuzzy matching algorithm that couldn't match concatenated S3 filenames (like \"sortinghashnode.webp\") to hyphenated post slugs, and the command running locally without AWS credentials, causing it to silently fail. The fix involved replacing set intersection with substring containment for matching and ensuring the command runs in the production environment with proper credentials.", "body_md": "Six blog posts on my portfolio had broken cover images for longer than I'd like to admit. The images were *in* S3. The management command to restore them had been written and run. And yet — nothing. Blank covers, every time.\n\nHere's the full breakdown of what went wrong, why, and how it got fixed.\n\n## The Setup\n\nMy portfolio backend is a Cookiecutter Django project running on a DigitalOcean droplet with Docker Compose. Blog post cover images are stored on AWS S3. The `cover`\n\nfield on the `BlogPost`\n\nmodel is an `ImageField`\n\nthat stores a relative path like `blog_posts/deploy.webp`\n\n— Django's S3 storage backend handles prepending the `media/`\n\nprefix and building the full URL.\n\nWhen I migrated away from using Hashnode as a headless CMS and imported all posts into my own Django backend, the cover images came along as CUID-based filenames (e.g. `blog_posts/cmoxrumae00ms2em7bje5at07.png`\n\n). Those CUIDs don't exist in S3 — the actual files were uploaded separately with descriptive names like `deploy.webp`\n\n, `manual.webp`\n\n, `sortinghashnode.webp`\n\n.\n\nTo fix this, Gemini wrote `restore_covers.py`\n\n, a management command with two matching strategies:\n\n-\n**Exact CUID match**— looks for`blog_posts/{cuid}.webp`\n\netc. in S3 -**Fuzzy slug match**— tokenizes the post slug and looks for S3 filenames with overlapping keywords\n\nThe command was run. Six posts still had broken covers.\n\n## Root Cause 1: The Fuzzy Matcher Couldn't Tokenize Concatenated Filenames\n\nThe S3 filenames are lowercase concatenated words: `sortinghashnode.webp`\n\n, `trackingpage.webp`\n\n, `postmortem.webp`\n\n. The fuzzy matcher works by calling `get_keywords()`\n\non both the post slug and each S3 filename, then computing the set intersection.\n\nHere's the problem. `get_keywords()`\n\nuses `re.findall(r\"[a-zA-Z0-9]+\", text)`\n\nto tokenize. Applied to a filename:\n\n```\nget_keywords(\"sortinghashnode.webp\")\n# → {\"sortinghashnode\"}   ← one token\n```\n\nAnd for the post slug:\n\n```\nget_keywords(\"sorting-hashnode-series-posts-how-to-display-the-latest-post-first\")\n# → {\"sorting\", \"hashnode\", \"series\", \"posts\", \"display\", \"latest\", \"first\"}\n```\n\nThe intersection of `{\"sortinghashnode\"}`\n\nand `{\"sorting\", \"hashnode\", ...}`\n\nis**empty**. Score = 0. No match.\n\nThe same failure applied to every concatenated filename in the bucket. `trackingpage.webp`\n\ncouldn't match a slug containing `tracking`\n\nand `page`\n\n. `postmortem.webp`\n\ncouldn't match a slug containing `mortem`\n\n(since `post`\n\nis a stop word). None of them scored above 0.\n\nThe fix was to replace the plain set intersection with substring containment, enforcing a minimum token length of 4 characters to prevent false positives from short words:\n\n```\nmin_token_len = 4\noverlap = set()\nfor pk in post_keywords:\n    for fk in file_keywords:\n        if pk == fk or (\n            len(pk) >= min_token_len\n            and len(fk) >= min_token_len\n            and (pk in fk or fk in pk)\n        ):\n            overlap.add(pk)\n```\n\nNow `\"sorting\" in \"sortinghashnode\"`\n\n→ True, score += 1. `\"hashnode\" in \"sortinghashnode\"`\n\n→ True, score += 1. The correct file gets matched.\n\n## Root Cause 2: No AWS Credentials in the Local Environment\n\nThe second reason the command silently failed: it was run via `docker compose -f docker-compose.local.yml`\n\n, which loads `.envs/.local/.django`\n\n. That file has no AWS credentials.\n\nWhen `storage.listdir(\"blog_posts\")`\n\nis called with no S3 credentials, it either errors out silently (caught by a bare `except Exception`\n\n) or returns an empty list because the local filesystem storage backend is active instead of S3. The command's output showed:\n\n```\nCould not list storage directory directly: ...\nFuzzy matching won't be available.\n```\n\nBut the overall command still exited 0 with a summary that made it look like it ran fine. With zero files in `storage_files`\n\n, the fuzzy loop had nothing to iterate over — so every post hit the \"no existing file found in storage\" branch.\n\nThe management command is designed to run in the production container, where `.envs/.production/.django`\n\nalready has the correct AWS credentials wired up.\n\n## Root Cause 3: One Image Was Never Uploaded to S3\n\nEven with both fixes above, the post *\"How I Fixed the Hashnode GraphQL API Stale Cache Bug (Stellate CDN)\"* would still have a broken cover — because no matching file exists in S3 at all. The DB had `blog_posts/cmlyqj0cc006627lvguola3gg.png`\n\nand no file with a descriptive name was ever uploaded for it.\n\nThis one requires a manual upload via Django admin.\n\n## The Fix\n\n**For the five posts with known S3 matches**, I wrote a Django data migration that directly sets the correct cover paths:\n\n```\nCOVER_FIXES = {\n    \"how-to-manually-backup-wordpress-sites-via-ssh\": \"blog_posts/manual.webp\",\n    \"deploying-cookiecutter-django-on-a-digitalocean-droplet-ubuntu-24-04-lts\": \"blog_posts/deploy.webp\",\n    \"post-mortem-the-march-2026-axios-supply-chain-attack\": \"blog_posts/postmortem.webp\",\n    \"sorting-hashnode-series-posts-how-to-display-the-latest-post-first\": \"blog_posts/sortinghashnode.webp\",\n    \"tracking-page-views-in-a-react-spa-with-google-analytics-4\": \"blog_posts/trackingpage.webp\",\n}\n\ndef fix_covers(apps, schema_editor):\n    BlogPost = apps.get_model(\"blogs\", \"BlogPost\")\n    for slug, cover_path in COVER_FIXES.items():\n        BlogPost.objects.filter(slug=slug).update(cover=cover_path)\n```\n\nThis runs automatically on `python manage.py migrate`\n\nduring the next production deploy — no manual SSH step needed.**For the fuzzy matcher**, the substring containment fix was patched into `restore_covers.py`\n\nso future runs work correctly for any similarly named files.**For the sixth post**, a manual image upload to Django admin is the remaining action item.\n\n## What I'd Do Differently\n\nThe real issue is that the restore command's failure mode was too quiet. It logged \"fuzzy matching won't be available\" but still printed a clean summary with zeroes in the \"could not restore\" column for cases where the file list was empty. That made it look successful.\n\nA better design: if `storage.listdir()`\n\nfails entirely, the command should exit early with a non-zero code rather than continuing with no files to match against. Silently succeeding at nothing is worse than loudly failing at something.\n\nThe slug-to-filename mismatch was also a predictable problem from the start. The files were uploaded manually with short descriptive names, but the DB records came from Hashnode with long CUIDs. A mapping file (even a simple JSON dict of `slug → filename`\n\n) would have made the restore command deterministic instead of relying on fuzzy heuristics.", "url": "https://wpnews.pro/news/post-mortem-why-my-blog-cover-images-silently-failed-to-restore-from-s3", "canonical_source": "https://dev.to/highcenburg/post-mortem-why-my-blog-cover-images-silently-failed-to-restore-from-s3-2mn4", "published_at": "2026-05-22 09:09:36+00:00", "updated_at": "2026-05-22 09:49:39.787621+00:00", "lang": "en", "topics": ["developer-tools", "cloud-computing", "open-source"], "entities": ["AWS S3", "Django", "DigitalOcean", "Docker Compose", "Hashnode", "Gemini", "Cookiecutter"], "alternates": {"html": "https://wpnews.pro/news/post-mortem-why-my-blog-cover-images-silently-failed-to-restore-from-s3", "markdown": "https://wpnews.pro/news/post-mortem-why-my-blog-cover-images-silently-failed-to-restore-from-s3.md", "text": "https://wpnews.pro/news/post-mortem-why-my-blog-cover-images-silently-failed-to-restore-from-s3.txt", "jsonld": "https://wpnews.pro/news/post-mortem-why-my-blog-cover-images-silently-failed-to-restore-from-s3.jsonld"}}