# I Built My Own Analytics + AB Testing Tool with Claude Code.

> Source: <https://pub.towardsai.net/i-built-my-own-analytics-ab-testing-tool-with-claude-code-57eed364756c?source=rss----98111c9905da---4>
> Published: 2026-06-25 21:31:00+00:00

[Part 1](https://medium.com/towards-artificial-intelligence/i-built-my-own-analytics-ab-testing-tool-in-a-weekend-with-claude-code-c67f411602f7) got events from the browser into Postgres. A pipeline that only counts pageviews is a worse Google Analytics, though. The reason to build your own is to do the things the off-the-shelf tools gate behind a sales call: run real experiments, and watch real sessions.

Both turn out to lean on the same humble trick: a hash function.

Most A/B tools ship a library that rewrites your DOM after the page loads. You’ve seen the result: the original headline flashes for 200ms, then snaps to the variant. It looks broken because it is.

I went the other way. A test is two URLs, control and variant, and the tracker redirects a share of traffic before the page paints. You build the variant as a real page. No DOM surgery, no flash. The cost is that you maintain two pages instead of patching one, which for landing pages is a trade I’ll take every time.

The hard requirement: a visitor must always land in the same variant, and I refuse to store a server-side record of who saw what. A hash gives you exactly that: a stable decision out of thin air.

Hash the visitor ID together with the test ID, get a number between 0 and 1, and walk the variant weights:

``` js
function hashToFloat(str) {                 // FNV-1a  var h = 0x811c9dc5;  for (var i = 0; i < str.length; i++) {    h ^= str.charCodeAt(i);    h = Math.imul(h, 0x01000193);  }  return (h >>> 0) / 0xffffffff;}
js
var bucket = hashToFloat(visitorId + test.id);var cumulative = 0, assigned = null;for (var i = 0; i < test.variants.length; i++) {  cumulative += test.variants[i].weight;  if (bucket <= cumulative) { assigned = test.variants[i]; break; }}
```

No database of assignments. No coordination. The same person hashes to the same bucket every visit, and mixing in the test ID means their bucket in one test tells you nothing about the next.

When you redirect to the variant, carry the query string over. Forget this and you strip the UTM and ad-click parameters off the URL, and your paid traffic suddenly looks like it came from nowhere:

``` js
var variantUrl = new URL(assigned.url, location.origin);new URLSearchParams(location.search).forEach(function (v, k) {  if (!variantUrl.searchParams.has(k)) variantUrl.searchParams.set(k, v);});location.replace(redirectUrl);   // replace(), so "back" skips the redirect
```

And fail fast. The assignment request gets a 2-second timeout, and every failure path does nothing and lets the page load:

```
xhr.timeout = 2000;xhr.ontimeout = function () {};   // show control, move onxhr.onerror   = function () {};
```

A visitor who sees the control because your API was slow is a non-event. A visitor staring at a blank page because you blocked render on a database query is a refund.

Because Part 1’s tracker stamps ab_variant onto every event, results are one grouped query: visitors and conversions per variant. The honesty lives in what you do with those counts. I wrote the stats with zero dependencies, and it's less code than the npm install would be.

A two-proportion z-test answers “is this difference real or just noise?”

``` js
const p1 = controlConversions / controlVisitors;const p2 = variantConversions / variantVisitors;const pPooled = (controlConversions + variantConversions) / (controlVisitors + variantVisitors);const se = Math.sqrt(pPooled * (1 - pPooled) * (1 / controlVisitors + 1 / variantVisitors));const zScore = (p2 - p1) / se;const pValue = 2 * (1 - normalCDF(Math.abs(zScore)));
```

But the number that keeps you honest is the confidence interval. “Variant B converts at 3.2%” invites you to celebrate. A Wilson interval of “3.2%, somewhere between 1.1% and 5.9%” tells you the truth: you don’t know yet. It’s the same midpoint, but showing the range is what stops people calling a win off forty visitors on a Tuesday.

The same file computes the sample size you need before you start, so “how long do we run this” has a real answer instead of a gut feel. There’s an auto-stop flag too: a scheduled job watches running tests and routes everyone to the winner once it clears the threshold.

Watching someone use your page is worth a hundred funnel charts. It’s also the heaviest thing in the whole system, so the scope is aggressive: replay records **only A/B test sessions, and only a sample of those.** If you’re recording everyone, you’re paying to store screensavers.

The recorder is bigger than the entire tracker, so it never ships in the main snippet. It loads only after a visitor is bucketed into a test:

```
function initReplay() {  if (replayStarted || !abTestId || isPreview) return;   // tests only  var s = document.createElement('script');  s.src = currentScript.src.replace(/pp\.js/, 'pp-replay.js');  s.onload = function () { window.__ppReplay.initReplayRecording(/* session context */); };  document.head.appendChild(s);}
```

Visitors who aren’t in an experiment never download a byte of it. That’s how you keep Part 1’s 5KB promise.

[rrweb](https://github.com/rrweb-io/rrweb) takes a DOM snapshot and then streams mutations, so replay is just rebuilding the page and replaying changes on a timeline. Reimplementing it is a months-long sinkhole. Configure it for privacy and noise up front:

```
record({  emit: function (e) { buffer.push(e); },  maskAllInputs: true,                 // never record what people type  blockSelector: '[data-pp-block]',  sampling: { mousemove: 50, scroll: 150, input: 'last' },});
```

maskAllInputs: true is the default, not a setting you remember to flip. Record one password field by accident and your analytics tool is now a breach waiting to happen. Mask everything; let sites unmask on purpose.

A half-recorded session is useless, so the record/skip decision is made once, deterministically, from the session ID, the exact same move as A/B bucketing:

``` js
var hash = 0;for (var j = 0; j < sessionId.length; j++) hash = ((hash << 5) - hash + sessionId.charCodeAt(j)) | 0;if (Math.abs(hash) % 100 >= sampleRate) return;   // default 50%
```

Recordings run minutes and hit megabytes. Buffer the whole thing and send at the end, and a tab that dies takes everything with it. So events flush as numbered chunks every 5 seconds, with the first chunk going out after just 1 second. It holds the bulky initial DOM snapshot, and flushing it early means even a two-second bounce leaves something watchable. The final chunk rides sendBeacon; the rest use XHR, which has no size cap.

Storage is two tables: one row of metadata per recording, many bytea chunk rows ordered by index. To play it back, fetch the chunks in order, concatenate, hand them to the rrweb player. One gotcha: a killed tab never sends its "final" chunk, so a cron job marks any recording with no new chunk in 60 seconds as done. Skip that and your "in progress" list grows forever.

If your page embeds another origin in an iframe (say a site builder wrapping an embedded scheduler), rrweb can’t see inside it, and the cross-origin recording option in rrweb v2 crashed outright on me. The workaround: a separate script inside the iframe records it independently, the parent broadcasts session context via postMessage (re-broadcasting to late-arriving iframes with a MutationObserver), and the dashboard stitches the two recordings back together by session ID. It's fiddly. It's also the only way to see inside frames you don't own.

Two experiments-grade features, both resting on a hash function and a respect for not blocking render. You can now run honest tests and watch the sessions behind them.

In Part 3 (to be published next week) , the part I find genuinely fun: feeding all of this, the events, the test results, and your actual customer calls, to an LLM that hands back advice specific enough to ship.

The companion docs hold the full version of everything above: the complete redirect logic, the whole stats engine, the rrweb config, and the iframe workaround in full:

**How to use them:** read alongside the post, or hand a doc to Claude Code and have it scaffold the piece. The stats doc in particular is exact enough to generate the whole significance.ts file from. No stats library required.

[I Built My Own Analytics + AB Testing Tool with Claude Code.](https://pub.towardsai.net/i-built-my-own-analytics-ab-testing-tool-with-claude-code-57eed364756c) was originally published in [Towards AI](https://pub.towardsai.net) on Medium, where people are continuing the conversation by highlighting and responding to this story.
