I Built an AI Land Fraud Detector for Kenya — Here's the Full Engineering Story

Engineering behind TitleTrust, an AI-powered land fraud detection system built for Kenya to combat systematic land theft, including "air supply" scams where non-existent plots are sold. The system uses Google Gemini 3 Pro's multimodal capabilities and "Thinking Mode" to perform forensic document analysis, cross-referencing dates, transactions, and legal hierarchies, while a GeospatialEngine verifies land locations using GPS and satellite data. The autonomous investigation agent coordinates these processes through a MarathonLoop state machine, providing field investigators with a live, traceable timeline of findings to identify forged documents, temporal impossibilities, and location mismatches.

"In God We Trust. In Land We Verify." In Kenya, land is not just an asset. It is identity. Families save for decades to buy a small shamba. It is the thing you show your children. The thing that says you made it. And it is being stolen — systematically, at scale, with forged documents and complicit officials — from ordinary people who have no way to fight back. The most brutal variant is called "air supply": a fraudster markets a plot of land that does not legally exist. They have a polished brochure, a site visit to a real-looking location, and a convincing sale agreement. The buyer pays. The title deed never comes. By the time they discover the fraud — sometimes years later — the money is gone and the legal system offers little recourse. The Lesedi Developers scandal is the most notorious recent example: thousands of prospective homeowners lost over Sh1 billion to phantom estates in Nairobi's satellite towns. Juja. Ruiru. Thika. Real places. Fake plots. This is the problem I set out to solve here using this Code I submitted TitleTrust to the 2026 Google Gemini 3 Hackathon. The timing was right: Gemini 3 Pro's native multimodality, two-million-token context window, and "Thinking Mode" made it the first model powerful enough to reason about the chain of title in the way a forensic auditor would — not just retrieve facts, but detect what is missing, what is temporally impossible, and what is legally void. This is not a chatbot wrapper. This is an autonomous investigation agent. Let me describe the product as a user experiences it, then I'll explain the engineering underneath each step. A field investigator — a community advocate, a local official, a buyer's agent — opens TitleTrust on their phone and starts a new investigation. They upload the "deal pack": a Title Deed, a Green Card the official registry history , a Mutation Form subdivision records , and a Sale Agreement. The session starts. A live timeline appears on screen. The agents go to work. The ForensicEngine backend/forensic engine.py takes the uploaded images and PDFs and runs them through Gemini 3 Pro with structured prompts that force chain-of-thought reasoning. It does not just OCR the documents. It reasons about them. It extracts every date, every transaction, every party name — and then checks: Why Gemini Thinking Mode matters here: standard LLMs hallucinate compromises when laws conflict. A County Zoning map might say "Residential" while the National Land Act says "Riparian Reserve." Gemini 3's include thoughts=True forces the model to reason through the hierarchy — National Law overrides County Law — and produce a defensible verdict with a visible reasoning trace. That trace is not a nice-to-have. In this context, the reasoning trace is the product. An investigator, a lawyer, or a judge needs to see why the AI flagged something, not just that it did. The GeospatialEngine backend/geospatial engine.py answers a different question: does the land exist where they say it does? The most insidious fraud variant is the bait-and-switch: the buyer is taken to a beautiful, flat, accessible plot. The title deed they receive is for a swamp 5km away. The investigator stands on the land, captures GPS and photos. The GeospatialEngine: geospatial verification event: either "Location verified" or "Location mismatch — please re-scan beacons" If the Solar API data shows the claimed plot is in a flood zone. If the 30-metre riparian buffer overlay shows 60% of the plot is legally unbuildable. The investigator knows before they pay. The MarathonLoop backend/agent/marathon loop.py is the job state machine that coordinates everything. It: It is designed to behave like an assistant-led investigation. The field investigator should not need to understand conveyancing law. They follow prompts. The system reasons. The Flutter client maintains a live, deduplicated, sequence-aware timeline of all agent activity. Every finding. Every evidence registration. Every verification step. This is not just UX polish. For an investigation tool, the live timeline is the audit trail that an investigator presents to authorities. It needs to be complete, ordered, and reproducible — even if the phone lost signal for ten minutes in the middle of a rural field check. Here is where most case studies go wrong: they describe the architecture and then separately describe the product. In TitleTrust, every engineering decision was made because of a product constraint. Let me walk through them. The field investigator is standing in Juja, on cheap mobile data, with an intermittent signal. WebSockets require a persistent bidirectional connection. When it drops, you rebuild state from scratch. SSE Server-Sent Events with Last-Event-ID gives you something better: the browser and Flutter client automatically reconnects and sends the last event ID it received. The server replays from exactly that point. The investigator's timeline heals itself without them noticing. For a progress-update use case — which is all we need for server→client communication — SSE is simpler, more resilient on mobile networks, and natively supports replay semantics. A standard message queue delivers messages and forgets them. That is fine for background jobs. It is not fine for an investigation audit trail. Redis Streams is an append-only, ordered log. Every event is stored with a stable offset. When the mobile client reconnects after a signal drop, the server can replay the exact sequence of agent actions from the last confirmed point. More importantly: after an investigation closes, a lawyer needs to reconstruct exactly what the agent found, in what order, with what evidence. Redis Streams is that record. It is not just infrastructure — it is the chain of custody. event id + stream offset Every event carries two identifiers: event id : a stable application-level UUID the client tracks across reconnectionsstream offset : the Redis Streams position for efficient server-side seeksThe client knows event id . The server knows stream offset . The resume logic maps between them. Why both? Because event id survives Redis restarts and re-ingestions — it is stable application identity. stream offset gives the server an efficient seek into the durable log. Without both, you either replay too much wasteful or risk replaying from the wrong point incorrect . For an investigation where every event is a piece of evidence, incorrect replay is a correctness bug, not just a performance bug. The Broadcaster backend/realtime/broadcaster.py maintains both an in-memory local fanout queue and a durable Redis Streams append. The in-memory queue delivers events to the SSE client in milliseconds — even when Redis has a hiccup. The Redis Streams append persists the event for replay and audit. In degraded mode Redis unavailable , the system flips a flag, continues local delivery, and the client can recover authoritative state from Firestore when connectivity returns. The investigator's timeline keeps updating. They never see a spinner. The product constraint driving this: an investigator doing a site visit in a low-connectivity area should not have their session stall because a Redis instance is momentarily unreachable. Availability for the user is non-negotiable. But so is the audit trail. The hybrid gives you both. Every piece of evidence the ForensicEngine registers — every photo, every document analysis result — gets: trace id linking it across servicessequence id for ordering within the sessionThis is not engineering over-engineering. This is chain of custody. If an investigator presents TitleTrust findings to a Land Control Board or a court, the evidence must be verifiable as unmodified and correctly attributed. The checksum proves content integrity. The trace ID proves provenance. The sequence ID proves ordering. Before any public deployment of a system that people will rely on to protect their life savings, I needed to be able to prove the system behaves correctly under failure — not just hope it does. The test harness tests/test realtime chaos.py , tests/support/fake redis.py includes: Real bugs surfaced in this harness that would never have appeared in happy-path testing: The product argument for this: stochastic integration tests find bugs sometimes. Deterministic failure injection finds bugs reproducibly. For a correctness-critical system, reproducible is the only acceptable standard. ┌─────────────────────────────────────────────────────────┐ │ Flutter Mobile Client │ │ RealtimeController: dedupe, sequence, gap detection │ │ RecoveryCoordinator: authoritative Firestore recovery │ └────────────────────┬────────────────────────────────────┘ │ SSE Last-Event-ID ▼ ┌─────────────────────────────────────────────────────────┐ │ FastAPI Backend │ │ /realtime/sse · /realtime/last-state/{session id} │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ Broadcaster │ │ In-process bounded queues low-latency local fanout │ │ Redis Pub/Sub cross-instance fanout │ │ Redis Streams durable ordered log + replay │ └──────┬──────────────────────┬───────────────────────────┘ │ │ ▼ ▼ ┌─────────────┐ ┌──────────────────────────────────┐ │ Redis │ │ Agent Workers │ │ Streams │ │ MarathonLoop orchestrator │ │ durable │ │ ForensicEngine vision/docs │ │ log │ │ GeospatialEngine GPS/maps │ └─────────────┘ └──────────────┬───────────────────┘ │ ▼ ┌──────────────────┐ │ Firestore │ │ canonical │ │ session state │ └──────────────────┘ To make this concrete, here is the complete flow for a single field check: session started event is emitted, broadcast locally, appended to Redis Streams.Entry 4: Charge to Equity Bank, 12/01/2018 and Entry 6: Discharge of Charge, 10/01/2018 . The Discharge predates the Charge. Temporal anomaly flagged. Evidence registered with SHA256 checksum and trace ID. evidence registered event emitted.geospatial verification event emitted.Last-Event-ID . Server maps it to Redis stream offset. Replays the 3 events the client missed. Timeline is complete.investigation complete with a risk score of 91 CRITICAL . The full event log — ordered, checksummed, traceable — is the audit trail the investigator presents to the Land Control Board.This is important because I did not use Gemini 3 as a chatbot. I used it as a reasoning engine embedded in a structured investigation workflow. Thinking Mode include thoughts=True — for legal conflict resolution. When a County Zoning map conflicts with the National Land Act, I cannot use a black-box verdict. The reasoning trace showing why one law overrides another is what makes the output usable in a legal context. Native multimodality — for reading handwritten Green Cards. Kenya's registry history is often handwritten, sometimes in cursive, on physical cards. Gemini 3 Pro reads these without a separate OCR step, preserving spatial context stamps over signatures, marginal notes that traditional OCR loses. 2M token context window — for tracing chain of title through decades of subdivisions. A Mother Title in Kiambu might have 50+ years of subdivision history. Fitting the complete legal history in a single context window enables the kind of deep chronological reasoning that was previously only possible for senior conveyancing lawyers charging Ksh 10,000+ per review. Structured outputs — for emitting machine-readable findings. Every ForensicEngine result is a typed JSON finding, not free text. This is what allows the mobile timeline to render findings as UI components rather than just paragraphs. 1. Tie every infrastructure decision to a user outcome. Redis Streams exists because investigators need an audit trail, not because append-only logs are cool. SSE exists because mobile reconnection in rural Kenya needs to be seamless. When you can answer "why does this exist?" with a user story, your architecture stays honest. 2. The reasoning trace is the product. For A