{"slug": "i-vibe-coded-a-stock-screener-into-production-then-my-2gb-server-oomed-and-de-me", "title": "I Vibe-Coded a Stock Screener Into Production. Then My 2GB Server OOMed and Google De-Indexed Me.", "summary": "A solo developer shipped a production stock screener covering 5,600 tickers across Korean and US markets, built almost entirely through AI-assisted \"vibe coding.\" The FastAPI backend on a 2GB VPS suffered an out-of-memory crash from unbounded in-memory caches, causing Googlebot to encounter 5xx errors during a critical indexing window. Google de-ranked the site, and the developer is still recovering weeks later after implementing capped LRU caches and memory monitoring.", "body_md": "Series intro.I'm a non-CS solo dev who built and shipped a production stock screener almost entirely by \"vibe coding\" with an AI agent. The site works. Users use it. And it has cost me, in real ways, every shortcut I took. This series documents those costs honestly — what broke, why, what I shipped to fix it, and what I'd do differently. Part 1 is the one that still stings: a server OOM that killed my SEO right as Google was starting to notice me.\n\n[StockDigging](https://stockdigging.com/en) is a free stock screener and ranking site covering Korean (KOSPI + KOSDAQ) and US (NYSE + NASDAQ) markets — about 5,600 active tickers in total. Every valuation metric (PER, PBR, market cap, etc.) is recomputed daily from that day's close × the latest financials. No stale snapshots, no aggregator middlemen.\n\nThe stack is conventional indie-dev fare:\n\nI shipped the first public version after writing maybe 5% of the code by hand. The other 95% was generated, reviewed, and iterated on with an AI agent. That part actually worked — the AI is a relentless and patient pair. The part that didn't work was *operations*. Specifically: capacity planning, memory hygiene, and the temperament not to push to production three times a day.\n\nThis post is about the worst single consequence of getting that wrong — and, just as importantly, what I did about it after.\n\nAround mid-May, my Google Search Console graphs did the thing every indie dev fears. Impressions, which had been climbing steadily, fell off a cliff. Average position drifted downward across a wide range of queries. Pages that used to show up on page 1 quietly slid to page 3, 4, never.\n\nI didn't notice immediately because the site itself looked fine when I checked it. It only looked fine *to me*. The crawler had a different experience.\n\nEarlier in the month, the FastAPI backend had run out of memory. Hard. Several in-memory caches I'd written — TTL-keyed dicts for rankings, stats, indices — were unbounded. Every unique query combination added an entry. Entries technically expired, but nothing evicted them between expiries. The dict kept growing. Resident memory climbed past what a 2 GB VPS can hand out to a single Python process, the OOM killer fired, systemd restarted the service.\n\nFrom my dashboard this looked like a brief blip. From Googlebot's perspective, a non-trivial slice of crawls during that window saw 5xx responses or connection failures. Google does not forgive 5xx politely. It does not send you an email saying \"hey, your server flaked, we're going to discount your rankings for a bit.\" It just stops giving you the impressions you were getting before, and waits to see if you've fixed the problem.\n\nThe technical fix took an evening. Earning back the rankings is taking weeks, and is not finished as I write this.\n\nIn hindsight, none of this needed an AI to predict. It's all in the systems-design canon. I just wasn't reading that part of the canon while I was vibing.\n\n**1. Unbounded in-memory caches.** Six of them. Each one started as a sensible \"let me memoize this expensive query for 5 minutes\" and grew over months as I added query parameters. The cache key got wider, the entry count got higher, nothing ever capped the size. An LRU with a max size would have been one extra line of code per cache.\n\n**2. A 2 GB VPS for a real workload.** Python + SQLite + Next.js + a fair bit of in-process state is not a 2 GB workload. It barely fits in 2 GB on a quiet day. The moment anything misbehaves — a cache leak, a long batch job, a sudden traffic spike — there's no headroom. I knew this on day one and shipped anyway because $6/month is $6/month.\n\n**3. No memory monitoring.** I had logs. I had request metrics. I did not have a single chart of RSS over time. If I'd been watching that one line, I'd have seen it climbing for weeks before it ever hit the OOM threshold.\n\n**4. Deploying during crawler hours.** My deploy script does atomic swap with rollback, so it's \"zero downtime\" — *for users*. For the crawler, even a few seconds of cache eviction during deploy plus the chunked rebuild of CSR routes is enough to register a degraded experience. I was pushing two or three times a day, often right when Googlebot was active.\n\n**5. Trusting that the AI would flag this.** This is the one I want to be the most honest about. I assumed an agent that good at code would also catch architectural smells like \"this dict has no upper bound.\" It doesn't, by default. It writes code that does what you asked. If you didn't ask \"what's the maximum size this structure can reach in a year of traffic,\" you don't get that answer.\n\nI want to be specific here, because most postmortems stop at the root cause and skip the part that takes most of the calendar — the patient, unglamorous fixing. Here is what landed, in roughly the order it landed.\n\n**1. A 4-hour RuntimeMaxSec floor in systemd.** A confession before it's a fix: even before I knew where every leak was, I added a systemd directive that hard-restarts the backend every 4 hours. It is not a solution. It is a ceiling on damage from any leak I haven't found yet. It is also $0 and took 20 minutes. If you're running anything stateful on a small VPS without one of these, add it tonight.\n\n**2. A watchdog cron for the daily batch jobs.** My data pipeline pulls prices and financials nightly. After the OOM event, those batches were getting silently skipped because the backend was restarting through their lock window. I added a watchdog cron that detects a missed batch and re-runs it with a leaner code path. Then I had to fix the watchdog because, the very next week, its first scheduled tick was firing five minutes *before* the main batch and hijacking the lock — accidentally turning the safety net into the cause. That story gets its own post.\n\n**3. Static JSON for every hot read path.** This was the biggest single architectural change, and the one I'd recommend to anyone running similar stack. Instead of the homepage and ranking pages hitting the API on every request, the nightly batch now precomputes those views into `data/rankings/{market}_{sort}.json`\n\nfiles. The Next.js server reads the JSON directly during SSR. The API doesn't get touched at all for the hottest pages.\n\n```\nflowchart LR\n    subgraph Before[\"Before — every page = DB hit\"]\n      U1[User / Googlebot] --> CF1[Cloudflare]\n      CF1 --> NX1[Next.js SSR]\n      NX1 --> API1[FastAPI]\n      API1 --> DB1[(SQLite + unbounded caches)]\n    end\n    subgraph After[\"After — hot paths bypass the backend\"]\n      U2[User / Googlebot] --> CF2[Cloudflare]\n      CF2 --> NX2[Next.js SSR]\n      NX2 --> JSON[(precomputed static JSON)]\n      NX2 -.cold paths only.-> API2[FastAPI]\n      API2 --> DB2[(SQLite)]\n      BATCH[Nightly batch] --> JSON\n    end\n```\n\nEven if the backend OOMs in the middle of a Google crawl, the pages Google cares about still serve correct data from the JSON files. The blast radius of \"the backend is unhappy\" shrank from \"the whole site is degraded\" to \"the long tail of less-popular detail pages is degraded.\" That's a real architectural win — not a band-aid.\n\n**4. Post-batch validation that calls the public API.** A separate failure I had to admit: my batch jobs were happily reporting \"success\" on days when the data they produced was wrong (one sector silently lost a metric for ~109 stocks). Now, after every batch finishes, a validation script makes real HTTP calls to the same endpoints users hit, for each market × sort combination. If a combination returns zero rows, or noticeably fewer than its 30-day baseline, the batch is flagged failed and I get an email regardless of what the row counts said. The validator caught two real regressions in its first month.\n\n**5. A \"data health\" check that runs every night and emails me when something looks off.** I have an internal admin page that also exposes the same data, but the more important piece is the cron job behind it: a script that runs after the nightly batch and verifies a dozen specific invariants per market. Failures email me; warnings get logged to inspect later. A representative night looks like this:\n\n``` bash\n$ python -m scripts.automation.data_health_check\ndata_health_check — 2026-05-26 22:00 KST\n─────────────────────────────────────────────────────────────\n[PASS]  kr_daily_price_recent           last=2026-05-22\n[PASS]  us_daily_price_recent           last=2026-05-22\n[PASS]  kr_trading_value_filled         99.8%  (2541 / 2544)\n[PASS]  kr_valuations_per_filled        99.5%  (2531 / 2544)\n[PASS]  kr_top50_mcap_match             50 / 50  within 1%\n[PASS]  kr_financial_margin_impossible  0 stocks\n[PASS]  kr_override_staleness           up to FY2025 (current)\n[PASS]  annual_revenue_lost             0 stocks\n[PASS]  batch_failed_24h                0\n[WARN]  us_shares_outstanding_filled    97.2%  (3033 / 3120)\n─────────────────────────────────────────────────────────────\nsummary:  9 pass · 1 warn · 0 fail\nemail sent: no   (only on fail)\n```\n\nEach line corresponds to an actual mistake I've made or seen. `financial_margin_impossible`\n\nexists because a sector's revenue line was misclassified and operating margin briefly read 73% for a securities firm. `override_staleness`\n\nexists because I have a hand-curated override file for financial-sector revenue that I have to update once a year and would otherwise forget. `top50_mcap_match`\n\nexists because I once shipped a deploy that quietly broke market cap for the most-visited page on the site. Each check is a scar.\n\n**6. An emergency repatch script — one command, full recovery.** When something *is* wrong on the public site, the recovery used to be: stop backend, run patch, restart, regenerate JSON, regenerate stats, regenerate stock detail JSON, purge edge cache, validate. Roughly ten steps, easy to forget one, very error-prone at 11pm on a weekend. I rewrote it as a single command (`emergency_repatch --market KR`\n\n) that runs every step in order, fails fast, and prints a checklist of what passed and what didn't. It's the single biggest reduction in \"how scared am I of operating this site\" I've ever made.\n\n**7. A three-agent independent review for any risky change.** This one isn't infrastructure, it's process. For any change I judge as risky to data integrity or SEO — schema migrations, anything that touches deploy timing, anything that changes a ranking calculation — I run the proposal past three separate AI agents in parallel and read all three reviews before I touch the code. They disagree about a third of the time, and the disagreement is usually the most useful signal. Pairing this with \"deploy less\" has, more than any other single habit, kept me from shooting myself in the foot in the recovery period.\n\n**8. Title/meta micro-tuning on edge-of-page-1 queries.** With the site itself stable, the SEO recovery is now an active project, not just waiting. I pulled Google Search Console data, identified queries where my pages were ranking 5–15 (the \"edge of page 1\" zone where small wording changes can move you up), and rewrote titles and meta descriptions for those specific pages. I track each change in an optimization log and check positions weekly. Not glamorous. Working slowly.\n\nFor anyone about to ship their first vibe-coded thing to a real domain with real SEO ambitions, this is the list I wish I'd had taped to my monitor:\n\n`functools.lru_cache(maxsize=N)`\n\n. `cachetools.TTLCache(maxsize=N, ttl=...)`\n\n. Anything with `maxsize`\n\n. If you can't name a reasonable cap, you can't have the cache.`ps -o rss`\n\nover 24 hours, scraped every minute, would have caught this in week one. You don't need Prometheus and Grafana on day one; a cron job to a CSV is fine.`RuntimeMaxSec`\n\nis a $0 safety net.Honesty about the things not yet done is the other half of an honest postmortem.\n\n`RuntimeMaxSec`\n\nfloor masks the leaks. The actual fix — putting a `maxsize`\n\non every TTL dict — is the next merged PR. Mechanical work; the audit was the slow part.Some of these are technical, some are about the relationship with the AI itself.\n\n`Cache-Control`\n\n.`DECISIONS.md`\n\nnext to your CLAUDE.md / cursor rules turns vague guilt into a reviewable artifact.**Part 2 — Data quality failures.** How a single misclassified financial line silently corrupted an entire metric for one sector for weeks, the validation harness I built after the fact, and the painful manual data override that's still patching the rest. The most uncomfortable post in this series, because the bug ran in production for far longer than the OOM did and nobody (including me) noticed.\n\nThere's a longer queue behind that — batch jobs failing in interesting ways, a data-licensing problem I'm currently migrating away from — but I'll only commit to posts when the story has a clear ending. More to come as the dust settles.\n\nIf you've shipped a vibe-coded thing into production and have your own story, I'd genuinely like to read it. The thing the AI tooling discourse is missing right now is the boring, post-launch half: not \"look what I built in a weekend\" but \"look what it cost me on day 90.\"\n\nI'll be writing the rest of that half here.\n\n*StockDigging is at stockdigging.com. It's free, ad-supported, no signup required to browse. The \"Why I Built It\" post is here if you want the founding context for this series.*", "url": "https://wpnews.pro/news/i-vibe-coded-a-stock-screener-into-production-then-my-2gb-server-oomed-and-de-me", "canonical_source": "https://dev.to/kiwon_song_1a5298f771b9ef/i-vibe-coded-a-stock-screener-into-production-then-my-2gb-server-oomed-and-google-de-indexed-me-11hi", "published_at": "2026-05-27 02:26:40+00:00", "updated_at": "2026-05-27 02:52:50.884937+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-products", "ai-startups", "mlops"], "entities": ["StockDigging", "Google Search Console", "KOSPI", "KOSDAQ", "NYSE", "NASDAQ"], "alternates": {"html": "https://wpnews.pro/news/i-vibe-coded-a-stock-screener-into-production-then-my-2gb-server-oomed-and-de-me", "markdown": "https://wpnews.pro/news/i-vibe-coded-a-stock-screener-into-production-then-my-2gb-server-oomed-and-de-me.md", "text": "https://wpnews.pro/news/i-vibe-coded-a-stock-screener-into-production-then-my-2gb-server-oomed-and-de-me.txt", "jsonld": "https://wpnews.pro/news/i-vibe-coded-a-stock-screener-into-production-then-my-2gb-server-oomed-and-de-me.jsonld"}}