I Spent Years Balancing Ledgers. Now I Balance Redis Connections.

wpnews.pro

cd /news/ai-infrastructure/i-spent-years-balancing-ledgers-now-… · home › topics › ai-infrastructure › article

[ARTICLE · art-20217] src=dev.to ↗ pub=2026-06-03T09:57Z topic=ai-infrastructure verified=true sentiment=↑ positive

I Spent Years Balancing Ledgers. Now I Balance Redis Connections.

A developer who spent years in accounting and finance has applied ledger-based discipline to production incidents, building AlertEngine as an operational governance system rather than a monitoring tool. The system enforces a strict hierarchy where policy determines incident detection and recovery, AI only diagnoses and recommends, and humans must authorize every action—with all decisions recorded in an immutable, actor-attributed audit trail. The source-available project, built to address the challenge of managing infrastructure in Zimbabwe where engineers may not always have laptop access, allows recovery authorization via a single tap on WhatsApp without requiring SSH or runbooks.

read3 min views19 publishedJun 3, 2026

I spent my career in accounting and finance before building infrastructure in Zimbabwe.

In accounting, every transaction has three properties:

Authorization — no entry without approval

Immutability — once recorded, never altered

Reconciliation — every debit has a corresponding credit, provable by audit

When I started building FastAPI AlertEngine, I applied the same discipline to production incidents. The result is not a monitoring tool. It's an operational governance system.

Monitoring tools tell you what broke after it broke. Datadog, Grafana, Sentry — they produce beautiful post-mortems.

Governance tools enforce that nothing executes without authorization, and they prove it afterward.

Most teams conflate the two. They buy monitoring, assume governance, and get surprised when auditors ask: "Who approved that deploy?"

AlertEngine separates them explicitly:

plain

Detection → Policy (deterministic, no AI)

Diagnosis → AI (explains, recommends, does not decide)

Authorization → Human (engineer taps approve)

Execution → Webhook (your infrastructure, your control)

Audit → Ledger (immutable, replayable, actor-attributed) This is not a feature list. It's an architectural hierarchy enforced by code.

Engineers in Zimbabwe aren't always at laptops when things break. WhatsApp is ubiquituous and can be the operational control plane.

That constraint produces something better than a dashboard: alerts that find you, with a single tap to authorise recovery. No SSH. No runbooks. No "log into Grafana and interpret the graph."

Just: "Something broke. Here's why. Tap approve. Nothing runs without you." The Ledger Philosophy

In finance, a ledger has two sides: what happened, and who authorized it.

AlertEngine's audit trail has the same structure:

JSON

{

"timestamp": 1717344000,

"incident_id": "inc-abc123-1685000000",

"stage": "AUTHORIZED",

"actor": "engineer",

"decision": "approve",

"reason": "Database connection pool exhausted — restart recommended",

"confidence": 0.87,

"policy_version": "1.0.0",

"tenant_id": "tenant-xyz789"

}

Every entry is append-only. Every entry has an actor. Every entry is replayable.

This is not logging. Logging tells you what the system did. A ledger tells you who authorized it and why.

Policy Is the Floor. AI Is the Ceiling.

The most important architectural decision in AlertEngine is this:

Claude cannot trigger a state transition.

Policy decides whether an incident exists. Policy decides when a system has recovered. Claude diagnoses and explains — but the state machine doesn't listen to Claude. It listens to incident_policy.py.

When health metrics recover, the pipeline doesn't ask Claude what to do. It calls should_recover(score, err) and if the threshold is met, it transitions to RECOVERED with actor="policy". Claude's recommendation is irrelevant.

A confident wrong AI diagnosis cannot cause an incident to escalate

A policy recovery override is logged as actor: "policy" — auditors can see exactly when and why

Changing thresholds is a one-line edit in one file, versioned, and logged in every subsequent audit entry

The audit trail never lies about who made the decision

Three forces are converging:

I'm also building a payment orchestration platform for the African "hustler" context. Getting infrastructure funding in Zimbabwe is genuinely hard.

So I packaged the operational governance layer as a standalone product. It solves a real problem — I needed it myself at 2am. It also funds the bigger build.

That felt worth being honest about.

The Code

The orchestrator is source-available. Every claim in this post is verifiable:

orchestrator/pipeline.py — policy hierarchy, actor="policy" on recovery override

orchestrator/incident_policy.py — single POLICY dict, versioned, env-configurable

orchestrator/audit.py — append-only Redis LIST, full actor attribution, replayable

Read the code. Audit the architecture. Then decide if your infrastructure deserves the same discipline as your accounting.

GitHub: github.com/Tandem-Media/fastapi-alertengine Install:

bash

pip install fastapi-alertengine

Managed orchestrator: anchorflowalertengine@outlook.com Built in Harare, Zimbabwe. 🇿🇼

source & further reading

dev.to — original article Supercharge Your Algorithmic Trading with CoinQuant PHP: The Ultimate SDK for Laravel and PHP 8.1+ Nobody Ever Calculated the ROI of Email How We Built an AI-Native Real Estate Platform for Northern Cyprus

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-spent-years-balancing-…

Read original on dev.to → dev.to/tandemmedia/i-spent-years-balancing-ledge…

mentioned entities

FastAPI

AlertEngine

Datadog

Grafana

Sentry

Zimbabwe

metadata

slugi-spent-years-balancing-ledgers-now-i-balance-redis-connections

topic#ai-infrastructure

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevJapan declines to join movement …

next →NVIDIA Put Petaflop Compute on Y…

── more in #ai-infrastructure 4 stories · sorted by recency

chatgptiseatingtheworld.com · 21 Jul · #ai-infrastructure

Has China outflanked the U.S. in global AI arms race? What the Kimi Moment suggests.

dev.to · 21 Jul · #ai-infrastructure

5 Places Sensitive Data Leaks in a React Native App (and How to Plug Them)

networkworld.com · 17 Jul · #ai-infrastructure

AI workloads shake up observability market

dev.to · 16 Jul · #ai-infrastructure

Watching Agents by Inithouse vs dashboards and alerts: monitoring future questions, not metrics

── more on @fastapi 3 stories trending now

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 8 Jul · #ai-tools

What's the Future of Clay?

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required