{"slug": "i-built-an-ai-land-fraud-detector-for-kenya-here-s-the-full-engineering-story", "title": "I Built an AI Land Fraud Detector for Kenya — Here's the Full Engineering Story", "summary": "Engineering behind TitleTrust, an AI-powered land fraud detection system built for Kenya to combat systematic land theft, including \"air supply\" scams where non-existent plots are sold. The system uses Google Gemini 3 Pro's multimodal capabilities and \"Thinking Mode\" to perform forensic document analysis, cross-referencing dates, transactions, and legal hierarchies, while a GeospatialEngine verifies land locations using GPS and satellite data. The autonomous investigation agent coordinates these processes through a MarathonLoop state machine, providing field investigators with a live, traceable timeline of findings to identify forged documents, temporal impossibilities, and location mismatches.", "body_md": "\"In God We Trust. In Land We Verify.\"\n\n## The Problem Is Personal\n\nIn Kenya, land is not just an asset. It is identity. Families save for decades to buy a small shamba. It is the thing you show your children. The thing that says you made it.\n\nAnd it is being stolen — systematically, at scale, with forged documents and complicit officials — from ordinary people who have no way to fight back.\n\nThe most brutal variant is called **\"air supply\"**: a fraudster markets a plot of land that does not legally exist. They have a polished brochure, a site visit to a real-looking location, and a convincing sale agreement. The buyer pays. The title deed never comes. By the time they discover the fraud — sometimes years later — the money is gone and the legal system offers little recourse.\n\nThe [Lesedi Developers scandal](https://politicalandlegalanthro.org/2024/10/04/non-existent-plots-land-fraud-in-nairobis-construction-boom/) is the most notorious recent example: thousands of prospective homeowners lost over **Sh1 billion** to phantom estates in Nairobi's satellite towns. Juja. Ruiru. Thika. Real places. Fake plots.\n\nThis is the problem I set out to solve here using this [Code](https://github.com/Wolfof420Street/TitleTrust/)\n\n## The Hackathon That Pushed Me to Build It\n\nI submitted TitleTrust to the **2026 Google Gemini 3 Hackathon**. The timing was right: Gemini 3 Pro's native multimodality, two-million-token context window, and \"Thinking Mode\" made it the first model powerful enough to reason about the *chain of title* in the way a forensic auditor would — not just retrieve facts, but detect what is *missing*, what is *temporally impossible*, and what is *legally void*.\n\nThis is not a chatbot wrapper. This is an autonomous investigation agent.\n\n## What TitleTrust Actually Does\n\nLet me describe the product as a user experiences it, then I'll explain the engineering underneath each step.\n\n### 1. The Investigator Opens a Session\n\nA field investigator — a community advocate, a local official, a buyer's agent — opens TitleTrust on their phone and starts a new investigation. They upload the \"deal pack\": a Title Deed, a Green Card (the official registry history), a Mutation Form (subdivision records), and a Sale Agreement.\n\nThe session starts. A live timeline appears on screen. The agents go to work.\n\n### 2. The Forensic Agent Reads the Documents\n\nThe **ForensicEngine** (`backend/forensic_engine.py`\n\n) takes the uploaded images and PDFs and runs them through Gemini 3 Pro with structured prompts that force chain-of-thought reasoning.\n\nIt does not just OCR the documents. It *reasons* about them.\n\nIt extracts every date, every transaction, every party name — and then checks:\n\n- Can you transfer land before you legally own it?\n*(No.)* - Can a Discharge of Charge predate the Charge itself?\n*(No — but forged Green Cards try.)* - Is the Surveyor on the Mutation Form licensed?\n*(Cross-referenced against gazette records.)* - Does the sum of the subdivided plots exceed the Mother Title's area?\n*(If yes: oversubscription fraud.)*\n\n**Why Gemini Thinking Mode matters here:** standard LLMs hallucinate compromises when laws conflict. A County Zoning map might say \"Residential\" while the National Land Act says \"Riparian Reserve.\" Gemini 3's `include_thoughts=True`\n\nforces the model to reason through the hierarchy — *National Law overrides County Law* — and produce a defensible verdict with a visible reasoning trace. That trace is not a nice-to-have. In this context, **the reasoning trace is the product.** An investigator, a lawyer, or a judge needs to see *why* the AI flagged something, not just that it did.\n\n### 3. The Location Agent Checks Physical Reality\n\nThe **GeospatialEngine** (`backend/geospatial_engine.py`\n\n) answers a different question: *does the land exist where they say it does?*\n\nThe most insidious fraud variant is the bait-and-switch: the buyer is taken to a beautiful, flat, accessible plot. The title deed they receive is for a swamp 5km away.\n\nThe investigator stands on the land, captures GPS and photos. The GeospatialEngine:\n\n- Validates GPS traces against the parcel geometry from the deed\n- Runs plausibility checks: distance-to-boundary, location confidence heuristics\n- Detects riparian vegetation patterns that suggest the plot is in a protected river reserve\n- Emits a\n`geospatial_verification`\n\nevent: either*\"Location verified\"*or*\"Location mismatch — please re-scan beacons\"*\n\nIf the Solar API data shows the claimed plot is in a flood zone. If the 30-metre riparian buffer overlay shows 60% of the plot is legally unbuildable. The investigator knows *before* they pay.\n\n### 4. The Orchestrator Keeps the Investigation Moving\n\nThe **MarathonLoop** (`backend/agent/marathon_loop.py`\n\n) is the job state machine that coordinates everything. It:\n\n- Starts when a session is created\n- Advances through investigation stages\n- Decides what to check next (without the investigator having to think about it)\n- Retries failed API calls with exponential backoff\n- Escalates to the user when human input is genuinely needed:\n*\"Please provide a clearer photo of the beacon in the northeast corner\"*\n\nIt is designed to behave like an assistant-led investigation. The field investigator should not need to understand conveyancing law. They follow prompts. The system reasons.\n\n### 5. The Mobile Client Shows Everything, Live\n\nThe Flutter client maintains a **live, deduplicated, sequence-aware timeline** of all agent activity. Every finding. Every evidence registration. Every verification step.\n\nThis is not just UX polish. For an investigation tool, the live timeline *is the audit trail* that an investigator presents to authorities. It needs to be complete, ordered, and reproducible — even if the phone lost signal for ten minutes in the middle of a rural field check.\n\n## The Engineering Architecture — Every Decision Tied to the Product\n\nHere is where most case studies go wrong: they describe the architecture and then separately describe the product. In TitleTrust, **every engineering decision was made because of a product constraint.** Let me walk through them.\n\n### Why SSE Instead of WebSockets\n\nThe field investigator is standing in Juja, on cheap mobile data, with an intermittent signal. WebSockets require a persistent bidirectional connection. When it drops, you rebuild state from scratch.\n\n**SSE (Server-Sent Events) with Last-Event-ID** gives you something better: the browser (and Flutter client) automatically reconnects and sends the last event ID it received. The server replays from exactly that point. The investigator's timeline heals itself without them noticing.\n\nFor a progress-update use case — which is all we need for server→client communication — SSE is simpler, more resilient on mobile networks, and natively supports replay semantics.\n\n### Why Redis Streams (Not Just a Message Queue)\n\nA standard message queue delivers messages and forgets them. That is fine for background jobs. It is not fine for an investigation audit trail.\n\n**Redis Streams** is an append-only, ordered log. Every event is stored with a stable offset. When the mobile client reconnects after a signal drop, the server can replay the exact sequence of agent actions from the last confirmed point.\n\nMore importantly: after an investigation closes, a lawyer needs to reconstruct exactly what the agent found, in what order, with what evidence. Redis Streams is that record. It is not just infrastructure — it is the chain of custody.\n\n### Why Two Cursors (`event_id`\n\n+ `stream_offset`\n\n)\n\nEvery event carries two identifiers:\n\n-\n: a stable application-level UUID the client tracks across reconnections`event_id`\n\n-\n: the Redis Streams position for efficient server-side seeks`stream_offset`\n\nThe client knows `event_id`\n\n. The server knows `stream_offset`\n\n. The resume logic maps between them.\n\nWhy both? Because `event_id`\n\nsurvives Redis restarts and re-ingestions — it is stable application identity. `stream_offset`\n\ngives the server an efficient seek into the durable log. Without both, you either replay too much (wasteful) or risk replaying from the wrong point (incorrect).\n\nFor an investigation where every event is a piece of evidence, *incorrect replay is a correctness bug, not just a performance bug.*\n\n### Why In-Process Broadcaster + Redis Streams (The Hybrid)\n\nThe `Broadcaster`\n\n(`backend/realtime/broadcaster.py`\n\n) maintains **both** an in-memory local fanout queue and a durable Redis Streams append.\n\nThe in-memory queue delivers events to the SSE client in milliseconds — even when Redis has a hiccup. The Redis Streams append persists the event for replay and audit.\n\nIn degraded mode (Redis unavailable), the system flips a flag, continues local delivery, and the client can recover authoritative state from Firestore when connectivity returns. The investigator's timeline keeps updating. They never see a spinner.\n\n**The product constraint driving this:** an investigator doing a site visit in a low-connectivity area should not have their session stall because a Redis instance is momentarily unreachable. Availability for the user is non-negotiable. But so is the audit trail. The hybrid gives you both.\n\n### Why Evidence Gets SHA256 Checksums and Trace IDs\n\nEvery piece of evidence the ForensicEngine registers — every photo, every document analysis result — gets:\n\n- A\n**SHA256 checksum** of the content - A\nlinking it across services`trace_id`\n\n- A\nfor ordering within the session`sequence_id`\n\nThis is not engineering over-engineering. This is chain of custody. If an investigator presents TitleTrust findings to a Land Control Board or a court, the evidence must be verifiable as unmodified and correctly attributed. The checksum proves content integrity. The trace ID proves provenance. The sequence ID proves ordering.\n\n### Why Deterministic Chaos Tests\n\nBefore any public deployment of a system that people will rely on to protect their life savings, I needed to be able to *prove* the system behaves correctly under failure — not just hope it does.\n\nThe test harness (`tests/test_realtime_chaos.py`\n\n, `tests/support/fake_redis.py`\n\n) includes:\n\n- A\n**fake Redis** that can be programmatically failed, truncated, or made to reject writes - A\n**failure injector** that simulates xadd failures, publish delays, and partial persistence - Tests that validate: sequence monotonicity, correct replay after Redis restart, fallback to local buffer, client convergence to authoritative state after gap detection\n\nReal bugs surfaced in this harness that would never have appeared in happy-path testing:\n\n- Prometheus collector collisions when test instances reused metric names\n- Replay mismatches when clients passed a non-Redis ID as the resume token\n- Accidental use of non-durable items as the authoritative replay source\n\n**The product argument for this:** stochastic integration tests find bugs sometimes. Deterministic failure injection finds bugs *reproducibly*. For a correctness-critical system, reproducible is the only acceptable standard.\n\n## Architecture Diagram\n\n```\n┌─────────────────────────────────────────────────────────┐\n│              Flutter Mobile Client                      │\n│  RealtimeController: dedupe, sequence, gap detection    │\n│  RecoveryCoordinator: authoritative Firestore recovery  │\n└────────────────────┬────────────────────────────────────┘\n                     │ SSE (Last-Event-ID)\n                     ▼\n┌─────────────────────────────────────────────────────────┐\n│                FastAPI Backend                          │\n│  /realtime/sse  ·  /realtime/last-state/{session_id}   │\n└────────────────────┬────────────────────────────────────┘\n                     │\n                     ▼\n┌─────────────────────────────────────────────────────────┐\n│                   Broadcaster                           │\n│  In-process bounded queues (low-latency local fanout)  │\n│  Redis Pub/Sub (cross-instance fanout)                  │\n│  Redis Streams (durable ordered log + replay)           │\n└──────┬──────────────────────┬───────────────────────────┘\n       │                      │\n       ▼                      ▼\n┌─────────────┐     ┌──────────────────────────────────┐\n│  Redis      │     │  Agent Workers                   │\n│  Streams    │     │  MarathonLoop (orchestrator)     │\n│  (durable   │     │  ForensicEngine (vision/docs)    │\n│   log)      │     │  GeospatialEngine (GPS/maps)     │\n└─────────────┘     └──────────────┬───────────────────┘\n                                   │\n                                   ▼\n                         ┌──────────────────┐\n                         │    Firestore     │\n                         │  (canonical      │\n                         │   session state) │\n                         └──────────────────┘\n```\n\n## A Day in the Life: One Investigation\n\nTo make this concrete, here is the complete flow for a single field check:\n\n-\n**Investigator uploads a deal pack**(Title Deed, Green Card, Mutation Form) via the Flutter app. -\n**Session created** in Firestore. MarathonLoop starts. A`session_started`\n\nevent is emitted, broadcast locally, appended to Redis Streams. -\n**ForensicEngine runs**: Gemini 3 Pro reads the Green Card. It finds`Entry #4: Charge to Equity Bank, 12/01/2018`\n\nand`Entry #6: Discharge of Charge, 10/01/2018`\n\n. The Discharge predates the Charge.**Temporal anomaly flagged.** Evidence registered with SHA256 checksum and trace ID.`evidence_registered`\n\nevent emitted. -\n**The timeline on the investigator's phone updates instantly** via SSE. They see:*\"⚠️ Temporal anomaly detected — Discharge of Charge predates Charge. Likely forgery.\"* -\n**Investigator walks the plot boundary** and captures GPS + photos. GeospatialEngine validates location. Finds: plot coordinates overlap 60% with the Athi River 30m riparian buffer.**Critical risk flagged.**`geospatial_verification`\n\nevent emitted. -\n**Phone signal drops** for 4 minutes. Client reconnects with`Last-Event-ID`\n\n. Server maps it to Redis stream offset. Replays the 3 events the client missed. Timeline is complete. -\n**Investigation complete.** MarathonLoop emits`investigation_complete`\n\nwith a risk score of 91 (CRITICAL). The full event log — ordered, checksummed, traceable — is the audit trail the investigator presents to the Land Control Board.\n\n## What I Used Gemini 3 For Specifically\n\nThis is important because I did not use Gemini 3 as a chatbot. I used it as a reasoning engine embedded in a structured investigation workflow.\n\n**Thinking Mode ( include_thoughts=True)** — for legal conflict resolution. When a County Zoning map conflicts with the National Land Act, I cannot use a black-box verdict. The reasoning trace showing\n\n*why*one law overrides another is what makes the output usable in a legal context.\n\n**Native multimodality** — for reading handwritten Green Cards. Kenya's registry history is often handwritten, sometimes in cursive, on physical cards. Gemini 3 Pro reads these without a separate OCR step, preserving spatial context (stamps over signatures, marginal notes) that traditional OCR loses.\n\n**2M token context window** — for tracing chain of title through decades of subdivisions. A Mother Title in Kiambu might have 50+ years of subdivision history. Fitting the complete legal history in a single context window enables the kind of deep chronological reasoning that was previously only possible for senior conveyancing lawyers charging Ksh 10,000+ per review.\n\n**Structured outputs** — for emitting machine-readable findings. Every ForensicEngine result is a typed JSON finding, not free text. This is what allows the mobile timeline to render findings as UI components rather than just paragraphs.\n\n## The Numbers\n\n| What | Why it matters |\n|---|---|\n| ~Ksh 500 per basic forensic check | vs Ksh 5,000–10,000 for a lawyer |\n| ~3 minutes for a full audit | vs 30 days for traditional due diligence |\n| SHA256 checksums on all evidence | Chain of custody for legal proceedings |\n| Deterministic chaos tests | Correctness provable, not assumed |\n| Degraded mode (Redis down) | Investigators never lose session continuity |\n\n## Key Engineering Takeaways\n\n**1. Tie every infrastructure decision to a user outcome.**\n\nRedis Streams exists because investigators need an audit trail, not because append-only logs are cool. SSE exists because mobile reconnection in rural Kenya needs to be seamless. When you can answer \"why does this exist?\" with a user story, your architecture stays honest.\n\n**2. The reasoning trace is the product.**\n\nFor AI systems used in high-stakes decisions — land fraud, medical diagnosis, legal analysis — the output is not the verdict. The output is the evidence behind the verdict. Design your AI integration accordingly.\n\n**3. Deterministic failure injection is the only way to prove correctness.**\n\nStochastic tests find bugs sometimes. Deterministic failure injection finds them reproducibly. For a system that people will rely on to protect their life savings, \"probably works\" is not good enough.\n\n**4. Hybrid realtime (local fanout + durable stream) is the right model for mobile-first AI systems.**\n\nLocal in-memory fanout keeps UX responsive under partial failures. Redis Streams provides the durable, replayable log for recovery and audit. You do not have to choose between fast and correct.\n\n## What's Next\n\n-\n**Shadow Registry**: a crowdsourced, hashed double-allocation detector. When two independent investigators verify the same plot, the system flags a potential double sale — without exposing either party's identity until a match is confirmed. -\n**Case Law RAG**: grounding forensic reasoning in Kenya Law Reports (eKLR) for land dispute precedents. -\n**Blockchain title anchoring**: publishing investigation hashes to an immutable ledger so findings cannot be retroactively modified. -\n**Hardware-backed device attestation**: for high-assurance deployments where the physical device must be cryptographically verified.\n\n## The Repo\n\nThe codebase is organized around clear responsibility boundaries:\n\n| File | Purpose |\n|---|---|\n`backend/realtime/broadcaster.py` |\nIn-process fanout, Redis Pub/Sub, degraded mode |\n`backend/realtime/store.py` |\nRedis Streams event store, two-cursor resume logic |\n`backend/agent/marathon_loop.py` |\nOrchestration, job lifecycle, event emission |\n`backend/forensic_engine.py` |\nVision analysis, evidence registration, checksumming |\n`backend/geospatial_engine.py` |\nGPS/parcel validation, spatial event emission |\n`frontend/titletrust/lib/realtime/realtime_controller.dart` |\nDedupe, sequence tracking, authoritative recovery |\n`tests/test_realtime_chaos.py` |\nDeterministic chaos suite |\n`tests/support/fake_redis.py` |\nProgrammatic Redis failure injection |\n\n## Final Thought\n\nLand fraud in Kenya is not a technology problem. It is a power problem: the people who commit it have access to systems, officials, and legal processes that ordinary buyers do not. TitleTrust does not fix that power imbalance by itself.\n\nBut it gives ordinary people — investigators, buyers, community advocates — the same forensic tools that a senior lawyer and a licensed surveyor would use. For Ksh 500 and three minutes instead of Ksh 10,000 and thirty days.\n\nThat is the point. The engineering exists to serve that point.\n\nIf you are building AI systems for high-stakes decisions in emerging markets, I would love to talk. The problems are real, the constraints are severe, and the standard tooling assumptions often do not hold.\n\n*Built with Gemini 3 Pro, FastAPI, Flutter, Redis Streams, and Firebase. Submitted to the Google Gemini 3 Hackathon, 2026.*", "url": "https://wpnews.pro/news/i-built-an-ai-land-fraud-detector-for-kenya-here-s-the-full-engineering-story", "canonical_source": "https://dev.to/wolfof420street/i-built-an-ai-land-fraud-detector-for-kenya-heres-the-full-engineering-story-4n5", "published_at": "2026-05-20 11:55:20+00:00", "updated_at": "2026-05-20 12:04:50.785512+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "startups", "cybersecurity", "data"], "entities": ["TitleTrust", "Google Gemini 3", "Lesedi Developers", "Kenya"], "alternates": {"html": "https://wpnews.pro/news/i-built-an-ai-land-fraud-detector-for-kenya-here-s-the-full-engineering-story", "markdown": "https://wpnews.pro/news/i-built-an-ai-land-fraud-detector-for-kenya-here-s-the-full-engineering-story.md", "text": "https://wpnews.pro/news/i-built-an-ai-land-fraud-detector-for-kenya-here-s-the-full-engineering-story.txt", "jsonld": "https://wpnews.pro/news/i-built-an-ai-land-fraud-detector-for-kenya-here-s-the-full-engineering-story.jsonld"}}