I Vibe-Coded a Stock Screener Into Production. Then My 2GB Server OOMed and Google De-Indexed Me.

wpnews.pro

Series intro.I'm a non-CS solo dev who built and shipped a production stock screener almost entirely by "vibe coding" with an AI agent. The site works. Users use it. And it has cost me, in real ways, every shortcut I took. This series documents those costs honestly — what broke, why, what I shipped to fix it, and what I'd do differently. Part 1 is the one that still stings: a server OOM that killed my SEO right as Google was starting to notice me.

StockDigging is a free stock screener and ranking site covering Korean (KOSPI + KOSDAQ) and US (NYSE + NASDAQ) markets — about 5,600 active tickers in total. Every valuation metric (PER, PBR, market cap, etc.) is recomputed daily from that day's close × the latest financials. No stale snapshots, no aggregator middlemen.

The stack is conventional indie-dev fare:

I shipped the first public version after writing maybe 5% of the code by hand. The other 95% was generated, reviewed, and iterated on with an AI agent. That part actually worked — the AI is a relentless and patient pair. The part that didn't work was operations. Specifically: capacity planning, memory hygiene, and the temperament not to push to production three times a day.

This post is about the worst single consequence of getting that wrong — and, just as importantly, what I did about it after.

Around mid-May, my Google Search Console graphs did the thing every indie dev fears. Impressions, which had been climbing steadily, fell off a cliff. Average position drifted downward across a wide range of queries. Pages that used to show up on page 1 quietly slid to page 3, 4, never.

I didn't notice immediately because the site itself looked fine when I checked it. It only looked fine to me. The crawler had a different experience.

Earlier in the month, the FastAPI backend had run out of memory. Hard. Several in-memory caches I'd written — TTL-keyed dicts for rankings, stats, indices — were unbounded. Every unique query combination added an entry. Entries technically expired, but nothing evicted them between expiries. The dict kept growing. Resident memory climbed past what a 2 GB VPS can hand out to a single Python process, the OOM killer fired, systemd restarted the service.

From my dashboard this looked like a brief blip. From Googlebot's perspective, a non-trivial slice of crawls during that window saw 5xx responses or connection failures. Google does not forgive 5xx politely. It does not send you an email saying "hey, your server flaked, we're going to discount your rankings for a bit." It just stops giving you the impressions you were getting before, and waits to see if you've fixed the problem.

The technical fix took an evening. Earning back the rankings is taking weeks, and is not finished as I write this.

In hindsight, none of this needed an AI to predict. It's all in the systems-design canon. I just wasn't reading that part of the canon while I was vibing.

1. Unbounded in-memory caches. Six of them. Each one started as a sensible "let me memoize this expensive query for 5 minutes" and grew over months as I added query parameters. The cache key got wider, the entry count got higher, nothing ever capped the size. An LRU with a max size would have been one extra line of code per cache.

2. A 2 GB VPS for a real workload. Python + SQLite + Next.js + a fair bit of in-process state is not a 2 GB workload. It barely fits in 2 GB on a quiet day. The moment anything misbehaves — a cache leak, a long batch job, a sudden traffic spike — there's no headroom. I knew this on day one and shipped anyway because $6/month is $6/month.

3. No memory monitoring. I had logs. I had request metrics. I did not have a single chart of RSS over time. If I'd been watching that one line, I'd have seen it climbing for weeks before it ever hit the OOM threshold.

4. Deploying during crawler hours. My deploy script does atomic swap with rollback, so it's "zero downtime" — for users. For the crawler, even a few seconds of cache eviction during deploy plus the chunked rebuild of CSR routes is enough to register a degraded experience. I was pushing two or three times a day, often right when Googlebot was active.

5. Trusting that the AI would flag this. This is the one I want to be the most honest about. I assumed an agent that good at code would also catch architectural smells like "this dict has no upper bound." It doesn't, by default. It writes code that does what you asked. If you didn't ask "what's the maximum size this structure can reach in a year of traffic," you don't get that answer.

I want to be specific here, because most postmortems stop at the root cause and skip the part that takes most of the calendar — the patient, unglamorous fixing. Here is what landed, in roughly the order it landed.

1. A 4-hour RuntimeMaxSec floor in systemd. A confession before it's a fix: even before I knew where every leak was, I added a systemd directive that hard-restarts the backend every 4 hours. It is not a solution. It is a ceiling on damage from any leak I haven't found yet. It is also $0 and took 20 minutes. If you're running anything stateful on a small VPS without one of these, add it tonight.

2. A watchdog cron for the daily batch jobs. My data pipeline pulls prices and financials nightly. After the OOM event, those batches were getting silently skipped because the backend was restarting through their lock window. I added a watchdog cron that detects a missed batch and re-runs it with a leaner code path. Then I had to fix the watchdog because, the very next week, its first scheduled tick was firing five minutes before the main batch and hijacking the lock — accidentally turning the safety net into the cause. That story gets its own post.

3. Static JSON for every hot read path. This was the biggest single architectural change, and the one I'd recommend to anyone running similar stack. Instead of the homepage and ranking pages hitting the API on every request, the nightly batch now precomputes those views into data/rankings/{market}_{sort}.json

files. The Next.js server reads the JSON directly during SSR. The API doesn't get touched at all for the hottest pages.

flowchart LR
    subgraph Before["Before — every page = DB hit"]
      U1[User / Googlebot] --> CF1[Cloudflare]
      CF1 --> NX1[Next.js SSR]
      NX1 --> API1[FastAPI]
      API1 --> DB1[(SQLite + unbounded caches)]
    end
    subgraph After["After — hot paths bypass the backend"]
      U2[User / Googlebot] --> CF2[Cloudflare]
      CF2 --> NX2[Next.js SSR]
      NX2 --> JSON[(precomputed static JSON)]
      NX2 -.cold paths only.-> API2[FastAPI]
      API2 --> DB2[(SQLite)]
      BATCH[Nightly batch] --> JSON
    end

Even if the backend OOMs in the middle of a Google crawl, the pages Google cares about still serve correct data from the JSON files. The blast radius of "the backend is unhappy" shrank from "the whole site is degraded" to "the long tail of less-popular detail pages is degraded." That's a real architectural win — not a band-aid.

4. Post-batch validation that calls the public API. A separate failure I had to admit: my batch jobs were happily reporting "success" on days when the data they produced was wrong (one sector silently lost a metric for ~109 stocks). Now, after every batch finishes, a validation script makes real HTTP calls to the same endpoints users hit, for each market × sort combination. If a combination returns zero rows, or noticeably fewer than its 30-day baseline, the batch is flagged failed and I get an email regardless of what the row counts said. The validator caught two real regressions in its first month.

5. A "data health" check that runs every night and emails me when something looks off. I have an internal admin page that also exposes the same data, but the more important piece is the cron job behind it: a script that runs after the nightly batch and verifies a dozen specific invariants per market. Failures email me; warnings get logged to inspect later. A representative night looks like this:

$ python -m scripts.automation.data_health_check
data_health_check — 2026-05-26 22:00 KST
─────────────────────────────────────────────────────────────
[PASS]  kr_daily_price_recent           last=2026-05-22
[PASS]  us_daily_price_recent           last=2026-05-22
[PASS]  kr_trading_value_filled         99.8%  (2541 / 2544)
[PASS]  kr_valuations_per_filled        99.5%  (2531 / 2544)
[PASS]  kr_top50_mcap_match             50 / 50  within 1%
[PASS]  kr_financial_margin_impossible  0 stocks
[PASS]  kr_override_staleness           up to FY2025 (current)
[PASS]  annual_revenue_lost             0 stocks
[PASS]  batch_failed_24h                0
[WARN]  us_shares_outstanding_filled    97.2%  (3033 / 3120)
─────────────────────────────────────────────────────────────
summary:  9 pass · 1 warn · 0 fail
email sent: no   (only on fail)

Each line corresponds to an actual mistake I've made or seen. financial_margin_impossible

exists because a sector's revenue line was misclassified and operating margin briefly read 73% for a securities firm. override_staleness

exists because I have a hand-curated override file for financial-sector revenue that I have to update once a year and would otherwise forget. top50_mcap_match

exists because I once shipped a deploy that quietly broke market cap for the most-visited page on the site. Each check is a scar.

6. An emergency repatch script — one command, full recovery. When something is wrong on the public site, the recovery used to be: stop backend, run patch, restart, regenerate JSON, regenerate stats, regenerate stock detail JSON, purge edge cache, validate. Roughly ten steps, easy to forget one, very error-prone at 11pm on a weekend. I rewrote it as a single command (emergency_repatch --market KR

) that runs every step in order, fails fast, and prints a checklist of what passed and what didn't. It's the single biggest reduction in "how scared am I of operating this site" I've ever made.

7. A three-agent independent review for any risky change. This one isn't infrastructure, it's process. For any change I judge as risky to data integrity or SEO — schema migrations, anything that touches deploy timing, anything that changes a ranking calculation — I run the proposal past three separate AI agents in parallel and read all three reviews before I touch the code. They disagree about a third of the time, and the disagreement is usually the most useful signal. Pairing this with "deploy less" has, more than any other single habit, kept me from shooting myself in the foot in the recovery period.

8. Title/meta micro-tuning on edge-of-page-1 queries. With the site itself stable, the SEO recovery is now an active project, not just waiting. I pulled Google Search Console data, identified queries where my pages were ranking 5–15 (the "edge of page 1" zone where small wording changes can move you up), and rewrote titles and meta descriptions for those specific pages. I track each change in an optimization log and check positions weekly. Not glamorous. Working slowly.

For anyone about to ship their first vibe-coded thing to a real domain with real SEO ambitions, this is the list I wish I'd had taped to my monitor:

functools.lru_cache(maxsize=N)

. cachetools.TTLCache(maxsize=N, ttl=...)

. Anything with maxsize

. If you can't name a reasonable cap, you can't have the cache.ps -o rss

over 24 hours, scraped every minute, would have caught this in week one. You don't need Prometheus and Grafana on day one; a cron job to a CSV is fine.RuntimeMaxSec

is a $0 safety net.Honesty about the things not yet done is the other half of an honest postmortem.

RuntimeMaxSec

floor masks the leaks. The actual fix — putting a maxsize

on every TTL dict — is the next merged PR. Mechanical work; the audit was the slow part.Some of these are technical, some are about the relationship with the AI itself.

Cache-Control

.DECISIONS.md

next to your CLAUDE.md / cursor rules turns vague guilt into a reviewable artifact.Part 2 — Data quality failures. How a single misclassified financial line silently corrupted an entire metric for one sector for weeks, the validation harness I built after the fact, and the painful manual data override that's still patching the rest. The most uncomfortable post in this series, because the bug ran in production for far longer than the OOM did and nobody (including me) noticed.

There's a longer queue behind that — batch jobs failing in interesting ways, a data-licensing problem I'm currently migrating away from — but I'll only commit to posts when the story has a clear ending. More to come as the dust settles.

If you've shipped a vibe-coded thing into production and have your own story, I'd genuinely like to read it. The thing the AI tooling discourse is missing right now is the boring, post-launch half: not "look what I built in a weekend" but "look what it cost me on day 90."

I'll be writing the rest of that half here.

StockDigging is at stockdigging.com. It's free, ad-supported, no signup required to browse. The "Why I Built It" post is here if you want the founding context for this series.

source & further reading

dev.to — original article Stop Learning Machine Learning Before GenAI 🤖 GoalPulse AI: The Ultra-Passionate World Cup WhatsApp Companion Built with PHP & Google AI I'm developing a tool called showsignature [critics wanted]

I Vibe-Coded a Stock Screener Into Production. Then My 2GB Server OOMed and Google De-Indexed Me.

Run your AI side-project on zahid.host