Teaching a Computer to Play 4X: How the Annhexation AI Works

A developer has built a layered AI system for the 4X strategy game Annhexation that separates strategy, planning, and execution into three distinct layers. The AI uses a prioritized goal stack that evaluates and scores potential objectives—from early expansion and military pushes to wonder racing and nuclear first strikes—against personality weights and world factors each turn. This decoupled architecture ensures the computer opponent maintains coherent long-term strategies, such as holding a military campaign goal for twenty-plus turns, rather than making twitchy turn-by-turn decisions.

Building a believable computer opponent for a 4X strategy game is one of those problems that turns out to be bottomless. I'd use the cliche it looks simple from the outside... but I don't think thats true, I thought this would be a tough nut from the outset. I've built a chess playing engine before and that was far simpler to get a strong opponent - though it helps that that is such a well understood and documented problem. The player wants an opponent that explores, expands, exploits and exterminates with apparent intent — one that musters an army over several turns, marches it across a continent, lands it on your shore and takes your city, all while you watched it coming and couldn't quite stop it. They do not want an opponent that teleports units, reads your mind, or sits inert in its starting cities until you wander into range. This post is a tour through the Annhexation AI https://annhexation.com — explaining how it makes decisions, what it remembers between turns, and how the same core machinery produces eight distinct civilizations and four difficulty levels. Annhexation isn't open source, so rather than quote the implementation I'll describe the design and illustrate the interesting bits with pseudocode. I should note that the AI is still under development but after a lot of bashing with a hammer its feeling in a pretty decent place. The single most important design decision in the Annhexation AI is that strategy, planning and execution are decoupled. These are three layers that are seperated on purpose and an AI turn flows through three layers: The payoff of this separation is, hopefully, coherence over time. A greedy turn-by-turn AI looks twitchy: it builds an army, gets distracted, disbands it, builds another. By contrast, an Annhexation AI that adopts a militaryPush goal will hold that goal for twenty-plus turns, funnelling production, research and unit movement toward a single objective until the city falls, the campaign demonstrably fails, or something seismic interrupts it. Strategy should be sticky while execution is flexible. A complete turn runs as an ordered sequence of discrete phases — from threat assessment and diplomacy through combat, movement, production and fortification: function runTurn player, world, aiState : detectEvents aiState, world diff against last turn → fire interrupts aiState.goals = evaluateStrategy player, world, aiState plans = buildOperationalPlans aiState.goals, player, world executeTactics plans, player, world the phase sequence see below aiState.snapshot = snapshot world remember this turn for next time return aiState At the heart of the strategic layer is a prioritized goal stack . Each turn the AI either keeps its current goals or re-evaluates them, and the menu of things it can want is rich: earlyExpand — plant N cities before consolidating earlyRush — exploit the opening with an aggressive early attack infrastructureConsolidation — buildings, population, growth militaryPush — sustained warfare against a chosen player defensiveWar / counterattack — react to aggression, retake what was lost navalInvasion — assault a distant landmass wonderRace , scienceVictoryPush , scoreOptimisation — the peaceful victory paths raidWar , asymmetricWar — economic harassment instead of conquest warPreparation , nuclearFirstStrike , recovery — the situational specialsGoals don't fire on rigid rules rather they're scored against each other and the highest-utility ones win. The scoring blends several signals: Every score is then multiplied by a personality weight . Roughly: function scoreGoals player, world, personality : scores = {} for goal in CANDIDATE GOALS: base = goal.baseValue player, world world factors = proximity × forceBalance × catchUp × opportunity scores goal = base × world factors × personality.weightFor goal return sortDescending scores e.g. early-expand ≈ base × siteRatio × proximityAdj × catchUp × personality.expansion Two of those terms are about the world; one is about who this civ is. That's how the same evaluation function produces a cautious turtle and a rampaging horde. The top goal priority 0 drives the turn. Secondary goals queue behind it, ready to take over the moment an interrupt fires. A 4X AI that only looks at its own empire plays in a vacuum. Annhexation's AI explicitly models every player it has met before deciding who to fight. The AI profiles each known rival across roughly eleven dimensions, each normalised to 0, 1 : militarisation , development , expansionism , techPace exposure and coastalExposure undefended or weakly-garrisoned cities borderTension and aggression forces massed near wonderFocus , scienceFocus , and the all-important isRunawayLeader flagIt also tracks trends — rising, flat or falling over the last five turns — so the AI reacts to a rival who is accelerating , not just one who is currently strong. Those snapshots are kept in persistent state so trend detection survives across turns. A second pass turns those profiles into a war-target ranking. For each rival it weighs: function scoreWarTargets rivals, me, personality : for r in rivals: affinity = personality.aggression × r.borderTension winnable = clamp myStrength / r.militarisation reachable = 1 / 1 + travelCost me, r distracted = r.aggression elsewhere r.score = affinity × winnable × reachable × 1 + distracted return sortDescending rivals The winner of that scoring becomes the target of a militaryPush , and the magnitude feeds back as an opportunity multiplier into goal evaluation. An exposed, accessible, distracted neighbour is a temptation the AI is built to notice and exploit. Personality in Annhexation isn't a single "aggression" slider — it's a vector of about twenty weights military production, attack appetite, expansion, wonder-building, research, naval production, raid preference, plus early-game tuning like second-city urgency and first-build preference . On top of that sits the doctrine system — eight civ-specific playbooks that override those weights and the AI's unit-composition preferences: | Civ | Doctrine | Signature | |---|---|---| | Mongolia | HORSE RUSH | +50% military production, +50% attack, double raid preference, cavalry-heavy armies | | Aztecs | WARRIOR RUSH | +40% military & attack, −20% expansion, melee-heavy early aggression | | Russia | EXPAND WIDE | +40% expansion, +30% garrison commitment | | Rome | INFRA FIRST | +40% infrastructure, +30% expansion | | France | WAR FOR SCIENCE | +40% research, +30% science-victory focus | | Greece | STRATEGIST | balanced militarisation across all domains | | Egypt | TURTLE WONDERS | +50% wonders & culture, −20% military | | England | COASTAL ONLY | +40% naval, +50% coastal-site preference, harbour priority | Because the doctrine only modulates shared machinery, Egypt and Mongolia run the identical goal-evaluation and combat code — they simply weight it toward completely different ends. Mongolia drowns you in cavalry; Egypt hides behind wonders and culture; England fights for the coastline. Combined with unique per civ units this gives each civ a distinctive personality. Once a goal is chosen, the operational layer turns intent into concrete plans. Unit quotas compute empire-wide demand for each unit class — settlers, workers, garrison, field army, reserve, naval, raiders — each scaled by goals, threat levels, personality and difficulty. During a militaryPush against a walled city, for instance, the garrison quota rises with threat level, melee demand jumps, and siege units become mandatory — you cannot crack walls without them, and the AI knows it. Unit composition picks the melee/ranged/siege/mounted ratio for an army. Against an unwalled city it loads up on ranged units free damage ; against walls it must bring siege. Doctrine tilts the mix, and resource gating caps it — no horses means no cavalry, no iron means no siege, full stop: function targetComposition target, doctrine, resources : if target.walled: mix = {melee: 0.4, siege: 0.4, ranged: 0.2} else: mix = {melee: 0.4, ranged: 0.5, mounted: 0.1} mix = applyDoctrineBias mix, doctrine HORSE RUSH → more mounted, etc. if not resources.horses: mix.mounted = 0 if not resources.iron: mix.siege = 0 return normalise mix Attack plans are first-class, multi-turn objects with an explicit lifecycle: mustering → gathering → advancing → besieging → assaulting ↘ naval awaitingTransport → embarking → sailing → landing ↗ Target selection scores enemy cities by proximity −5 per hex of distance , with bonuses for being unwalled +15 , being a capital +10 , and sitting near iron or horses the AI needs a big multiplier gated on personality and urgency . It goes for the weakest reachable target first — and it commits. City production is a distributed priority queue: high-output cities feed global military needs first, low-output cities backfill settlers and workers. The priority cascade runs upgrades → settlers → garrison → military → naval → workers/roads → buildings → wonders, gated by the active goal. Research follows the goal: an expanding AI beelines the wheel and animal husbandry. A science-victory AI walks a hardcoded path toward rocketry while a warring AI weights military techs. It searches the prerequisite tree but abandons paths longer than three techs — no hundred-turn detours. In theory Worker management plans and caches road routes between cities and strategic resources, invalidating them when borders flip. Bottleneck detection explicitly diagnoses why military modernisation is stalled — waiting on a tech, lacking road access to iron, missing currency for trade — and escalates urgency the longer the bottleneck persists. When the planning is done, the AI executes the turn as an ordered sequence of phases. Roughly: Event detection & city-loss response compare against last turn's snapshot Emergency garrison fill enemy standing on a city tile Unit upgrades & recalls Retreats pull damaged units that aren't committed Combat city defence first, then general Naval invasion lifecycle drive the beachhead state machines Settler escorts & transport convergence Army movement via the movement planner Build orders worker tasks, roads Diplomacy trade, war declarations City Defence Commander per-city garrison assignment Government & tech completion Fortification & hidden-unit setup A few pieces deserve a closer look. function shouldAttack attacker, defender, difficulty : atk = attacker.strength × difficulty.combatEffectiveness def = defender.strength × terrainBonus × fortifyBonus × garrisonBonus winProb = clamp 0.5 + atk - def × 0.1, 0, 1 return winProb ≥ attacker.riskTolerance Movement shares a context across all units so two units never plan into the same tile no accidental stacking . It uses strategic pathing with an A fallback, plus anti-oscillation rules — it won't step back onto a tile it occupied in the last couple of turns unless it's hurt or there's an enemy adjacent — which kills the classic "AI unit jitters back and forth forever" bug. Retreat pulls units below an HP threshold 50% on Easy, down to 20% on Deity or when outnumbered 2:1 nearby — but garrisons never retreat, assault-committed units only break below 15%, and loaded transports never run. Commitment is respected. The City Defence Commander automates each threatened city's garrison through its own little state machine — reinforcing → defending → critical → secure — tracking the local force balance and issuing movement orders to defenders. Cities defend themselves intelligently without the strategic layer micromanaging every hex. None of this multi-turn coherence works without persistence. The AI's state object is serialised between turns and carries, among other things: counterattack knows what to retakeThat last point drives the AI's reactivity. Each turn it diffs the current world against last turn's snapshot to spot captured or lost cities, fresh war declarations, lost wonders, completed techs, detected nukes, and pillaged tiles. Any of these can fire an interrupt that pre-empts the current goal — lose a city and the AI drops what it was doing to respond; lose your capital and counterattack jumps the stack. function detectEvents aiState, world : prev = aiState.snapshot for change in diff prev, world : if change is CITY LOST: raise Interrupt counterattack, change.city if change is WAR DECLARED: raise Interrupt defensiveWar, change.by if change is NUKE DETECTED: raise Interrupt recovery, change.where ... wonders lost, tiles pillaged, techs done Difficulty in Annhexation is partly competence and partly bonus — and the line between them is deliberate. | Easy | Normal | Hard | Deity | | |---|---|---|---|---| | Production / Research / Gold | 0.8× | 1.0× | 1.15× / 1.1× / 1.1× | 1.3× / 1.25× / 1.2× | | Combat phasing & focus fire | off | on | on | on | | Will retreat | no | yes | yes | yes | | Combat effectiveness | 0.95× | 1.0× | 1.08× | 1.15× | | Decision accuracy | ~60% | 100% | 100% | 100% | | Strategy re-evaluation | every 20 turns | 12 | 10 | 8 | So an Easy AI isn't just weaker — it genuinely plays worse: it makes suboptimal choices more often, doesn't phase its combat, doesn't retreat damaged units, and reconsiders its strategy only sluggishly. A Deity AI plays the engine to its full ability and gets economic bonuses on top. The higher difficulties also unlock a small, clearly-scoped set of adaptive cheats: a fog-of-war peek at rival posture, conditional production boosts while pursuing a goal, completion boosts on the home stretch of a wonder or spaceship, and an increased chance of coordinating a joint attack with another AI. These are bonuses with a purpose rather than omniscience. The Annhexation AI deliberately trades short-term tactical perfection for long-term strategic coherence. Its unit movement is somewhat greedy; it will occasionally make a locally-suboptimal step. But it musters real armies, plans amphibious invasions across several turns, reads which neighbour is weak and accessible, holds a campaign together through a dozen turns of grinding siege, and reacts when you take one of its cities. The architecture is what makes that possible: a sticky goal stack on top, multi-turn plans in the middle, flexible greedy execution at the bottom, and a persistent memory threading it all together — with personality and difficulty as multipliers reaching into every layer. The result is eight civilizations that feel different, four difficulty levels that genuinely play differently, and an opponent whose intentions you can usually see coming. Stopping them is the game. It doesn't take long before you realise that working on the AI will need you to analyse a lot of games and a lot of data. You need to see why it did something - as the AI grows in complexity you'll find, or I found, that I would end up with units sat idle, units osciallating between two positions, hopeless attacks, settlers refusing to found cities. And all this can be impacted by all the possibilities that can emerge from the complex set of rules the AI follows and the situations that develop on the map. And so you need instrumentation, a way to interrogate it, and a way to play more games than you humanly can. At least as a solo developer And so a big chunk of work turned out not to be the AI itself but building tools to let me use it and interrogate it. Playing the game by hand to test the AI is hopeless — turns are slow, and you need hundreds of them across many games to spot patterns. So there's a command-line testbed that runs all-AI games with no rendering and no human in the loop: testbed new --map continent --difficulty deity --players 6 create an all-AI game testbed run <gameId --turns 250 --snapshot-every 10 advance it, headless testbed inspect <gameId one-shot state summary testbed list all games + winners run advances a game by N turns as fast as the machine will go, printing per-turn progress and bailing early if someone wins. inspect dumps a per-player table — civ, city count, unit count, gold, current research, alive or dead — and list shows every game in the diagnostics directory with its current turn and winner. This is what turns "I think the Mongolian AI rushes too hard" into "I ran forty games and Mongolia wins by turn 90 in thirty of them" — the difference between a hunch and a regression test. Everything is stored in a per-game directory state.json , ai-states.json , a run.log of notable events like cities founded and wars declared ready for inspection. The CLI is great for volume but blind to space — it can't show you that the army is stuck because a single enemy scout is sitting on the only bridge. For that I run all-AI games inside the actual client. When a game has no human player the normal "End Turn" button is replaced by a testbed panel: buttons to advance 1, 5, 10, 20, 50 or 250 turns, and a "view as" dropdown that swaps the map's fog-of-war filter so you can watch the game unfold from any AI's perspective. Layered on top of that is an AI inspector that lets you select any AI unit or city and it surfaces the internal state that the JSON logs hold, but anchored to what you're looking at on the map: militaryPush vs player 2 → city 42 , scienceVictory: 4/4 parts, 5 techs left gathering → besieging → assault , unit fill 5/8 units, siege needed and rally pointUnderneath both of those is the thing I lean on most: every AI writes a complete, structured record of its reasoning every single turn. Point an environment variable at a directory and each turn produces a pretty-printed JSON file per AI player — turn-014-mongolia.json and a companion full-state ai-state-014-mongolia.json . These aren't log lines; they're a forensic snapshot of the entire decision. A single turn file captures the goal stack with its scores, the posture and opportunity score it assigned every rival, every city's production and classification, every unit's assignment role, target, commitment, position, HP , the active attack plans — and, crucially, a command trace: an ordered list of every command the AI issued that turn, tagged with the phase that issued it, and success: true or a blocked reason straight from the engine. So when a move silently does nothing, the log tells you the engine rejected it and why. There are dedicated traces for the gnarly subsystems too: a combat trace of every simulated fight, a naval lifecycle narrative for debugging amphibious invasions the single most fiddly thing in the whole AI , and a citySiteDecisions list recording every settle attempt and its outcome — accepted , too-close-to-foreign-city , food-tiles-short , on-foreign-landmass-blocked . That last one is the cure for the maddening "why won't this settler settle?" bug: the answer is right there in the file. Here's a heavily, heavily, trimmed example JSON from a turn: { "turn": 18, "playerId": "player 4", "civilisation": "greece", "doctrine": "STRATEGIST", "difficulty": "hard", "goals": { "type": "earlyExpand", "priority": 0, "status": "active", "createdOnTurn": 11, "targetCityCount": 4, "settlerCount": 0, "bestSites": { "q": 23, "r": 20, "totalScore": 111.4, "penalties": 0 }, { "q": 25, "r": 19, "totalScore": 109.6, "penalties": 0 } / … 277 more, descending … / }, { "type": "infrastructureConsolidation", "priority": 1, "status": "active" }, { "type": "warPreparation", "priority": 2, "status": "active", "targetPlayerId": "player 1", "targetForceSize": 4, "currentForceSize": 3 } , "postures": { "player 2": { "militarisation": 0.69, "isRunawayLeader": true, "borderTension": 0.27 } }, "cities": { "name": "Athens", "population": 2, "production": "library", "classification": "border" } , "commandTrace": { "step": "10", "command": "moveUnit", "unitId": "unit 14", "role": "worker", "from": "25,23", "to": "26,23", "success": true }, { "step": "10", "command": "buildImprovement", "unitId": "unit 14", "success": true }, { "step": "16", "command": "endTurn", "success": true } } The workflow ties together neatly. Run a few hundred turns headless with the CLI; spot a game that went wrong in the list output; either replay it in the browser with the F3 inspector or crack open the turn-N JSON and read, in order, exactly what the AI was thinking and what the engine let it do. Most of the "the AI is being dumb" moments turn out to be one specific, fixable thing — and these tools are how you find it instead of guessing. Creating an AI for a 4X is definitely quite an undertaking. Its pretty easy to get units moving around but getting the AI to act in ways that are both interesting and credible takes a lot of effort. Its not that the code is complicated but that their is so much interacting that small changes can result in difficult to predict second and third order effects. I spent countless hours on things that on the one hand seem simple "stop a unit from oscillating between A and B" but turn out to be really rather complex. While yes you can put in guards "don't do this" the guards themselves can have unforeseen effects and don't fix root problems. You also can't automate all this away. Yes you can create test cases, yes you can have the AI play countless games against the AI, but an AI isn't a human and its the human the AI has to respond interestingly to. I've released Annhexation into early access now and the primary reason for that is the AI. I need more people to play it and then resolve the things that inevitably will emerge. If you'd like to give it a go you can play it online, for free, now. https://annhexation.com