{"slug": "why-i-built-a-tiny-repeated-game-poker-analysis-tool", "title": "Why I Built a Tiny Repeated-Game Poker Analysis Tool", "summary": "A developer built repeated-poker-analysis, an experimental Python toolkit for small abstract poker games that analyzes the value of committing to a fixed strategy across repeated occurrences of the same spot. The tool runs a candidate-analysis pipeline on a toy river game where showdown always chops but rake still bites, computing opponent best responses and hero EV under conservative tie handling. The project deliberately separates commitment analysis from repeated-game equilibrium claims, respecting that finite repetitions with perfect rationality often collapse to one-shot equilibrium by backward induction.", "body_md": "Most poker solvers answer one question very well: given a single hand and a single decision tree, what is the equilibrium strategy? (Yes, there is subgame solving, node locking, and plenty more — but the default frame is still one hand, one equilibrium.)\n\nI kept getting stuck on a different one. What if the *same kind* of spot shows up over and over, and a player can commit to a fixed strategy across those repetitions? In a few toy games I had a hunch, worked out by hand, that committing to a fixed strategy could change its value relative to the one-shot picture. I wanted a tool that could make that commitment value precise — to actually *analyze* it rather than just believe it. (Whether any of this rises to a repeated-game equilibrium is a much stronger claim, and one I am deliberately not making here.)\n\nI'm still learning software engineering, so until recently I couldn't implement this — I was stuck reasoning about toy games on paper. AI tooling made the analysis feasible, so I finally started building it: `repeated-poker-analysis`\n\n.\n\nIt's a small research project: write one narrow model down, run small examples, and record what the model does and doesn't justify.\n\n`repeated-poker-analysis`\n\nis\nIt is an experimental Python toolkit for small abstract poker games. The current MVP covers:\n\n`T_deadline`\n\n, an economic adaptation deadline,`T_detect`\n\n, an observable-distribution sensitivity estimate,It is small on purpose. It is not a full solver and it is not wired to real solver ranges. It starts from one toy game — a river spot — that is tiny enough to inspect and test by hand.\n\nThat toy spot is one where showdown always chops but rake still bites. In a single-hand view, putting more money into a raked pot can be locally unattractive. Across repeated occurrences the same spot raises a commitment question: if one player refuses to fold in a fixed pattern, how does the other respond, and how fast would that response have to come for the commitment to stop being worth it?\n\nThis is the question I wanted a tool to make precise — not a claim that any new equilibrium exists.\n\nRepeated games sound like a natural home for reputation, punishment, and adaptation, and poker has obvious repeated structure: similar river spots, similar blind-vs-blind situations, similar sizings, similar pools.\n\nHere is the trap I had to respect. If the number of repetitions is known, the game is fully observed, each spot is independent, and both players are perfectly rational, then a finite repeated game often collapses back toward the one-shot equilibrium by backward induction. \"This spot happens five times\" is *not* by itself enough to claim a reputation equilibrium. That is the standard game-theory result, and it is the reason the project keeps the layers below separate.\n\nSo the project keeps several ideas apart that are easy to blur:\n\nThe MVP mostly lives in the commitment-analysis layer: if Hero is fixed to a candidate strategy in the supplied tree, what are Villain's exact best responses, and what happens to Hero EV under conservative tie handling?\n\n*(This describes the MVP on main at the time of writing. I'm still changing it, so details may move.)*\n\nIt runs an end-to-end candidate-analysis pipeline on a small abstract game:\n\n`T_deadline`\n\nand local `T_detect`\n\n,In plain terms, the analysis loop is:\n\n`robustly_profitable`\n\nonly when that post-response worst-case Hero EV is strictly higher than Hero's EV in the baseline profile. The point is not \"positive EV\" — it is \"still better than the one-shot baseline even after the opponent best-responds.\"`T_deadline`\n\n/ `T_detect`\n\nthen add repeated-game timing on top of the candidates that survive.The main entry point is `run_candidate_analysis_pipeline`\n\n.\n\n```\npython scripts/check_mvp.py\n```\n\nA simplified workflow:\n\n``` python\nfrom nuts_chop_river import build_nuts_chop_river, default_hero_strategy\nfrom candidate_library import baseline_villain_strategy\n\nfrom repeated_poker import (\n    CandidateFilterConfig,\n    CandidateGenerationConfig,\n    run_candidate_analysis_pipeline,\n)\n\ntree = build_nuts_chop_river()\nbaseline_hero = default_hero_strategy()\nbaseline_villain = baseline_villain_strategy()\n\nresult = run_candidate_analysis_pipeline(\n    tree,\n    baseline_hero,\n    baseline_villain,\n    generation=CandidateGenerationConfig(shift_amounts=[0.1, 0.2]),\n    horizon=5,\n    profit_tolerance=-2.0,\n    max_selection_l1_distance=0.3,\n    detection_log_likelihood_threshold=3.0,\n    detection_occurrence_probability_per_opportunity=0.5,\n    filtering=CandidateFilterConfig(\n        max_l1_distance=0.3,\n        min_required_observations=5,\n    ),\n)\n\nprint(result.markdown_summary)\n```\n\nThe output is a diagnostic report for the model you supplied, not a poker recommendation. Here is an excerpt from the actual output of `examples/analysis_pipeline.py`\n\non the nuts-chop river toy game (I've trimmed the Configurations block and some columns; 8 candidates generated, 6 dropped by the filter, 2 compared):\n\n```\ngenerated=8 kept=2 excluded=6\ncompared=2\n\n## Candidate Analysis Summary\n\n### Summary Counts\n- total: 2\n- eligible: 2\n- excluded: 0\n- minimum_villain_ev: 1\n- pareto_frontier: 2\n\n### Candidate Rows\n\n| candidate_id            | fixed_hero_ev | post_response_hero_ev_worst | robustly_profitable | t_detect | exclusion_reasons |\n| ----------------------- | ------------- | --------------------------- | ------------------- | -------- | ----------------- |\n| H1\\|check->bet\\|shift=0.1 | 0.625         | -0.850                      | no                  | 278      | -                 |\n| H1\\|bet->check\\|shift=0.1 | 0.275         | -0.750                      | no                  | 294      | -                 |\n```\n\nThe baseline Hero EV in this run is +0.45. The column that matters is `robustly_profitable`\n\n: it is `yes`\n\nonly when `post_response_hero_ev_worst`\n\nexceeds that baseline. Here both candidates are `no`\n\n(-0.85 and -0.75 are below +0.45). A candidate that clears the baseline is rare and can exist in constructed cases — the tool's job is to search the candidates and find it when it does. The next section is a hand-built spot where one does.\n\nI needed at least one example where the machinery clearly does what it is meant to: a known spot where committing to a fixed strategy leaves Hero better off than the one-shot baseline, *even after* the opponent best-responds. This nuts-chop steal is that example, and I wrote a dedicated test for it ([ tests/test_nuts_chop_steal_commitment.py](https://github.com/guriguri215-lang/repeated-poker-analysis/blob/main/tests/test_nuts_chop_steal_commitment.py)). Treat it as a check that the tool can detect the effect at all — not as the end goal, and not as a claim about real games. Outside this constructed spot I do not know which situations, if any, are profitable to commit in.\n\nThe spot: a river where the board is already the nuts, so every showdown chops. There is no value betting — the only reason to bet (shove) is fold equity. Rake is below its cap, so a *called* pot just bleeds chips to the house. With a small starting pot and a big shove, a single hand looks like this:\n\n```\ninitial commitment = 1, initial pot = 2, bet = 98, rake = 5%, cap = 4\n\n| Line         | Hero/IP EV | Villain/OOP EV |\n|--------------|-----------:|---------------:|\n| check-check  |      -0.05 |          -0.05 |\n| bet-fold     |      -1.00 |          +1.00 |\n| bet-call     |      -2.00 |          -2.00 |\n```\n\nIn one hand the caller folds: -1.00 (fold) beats -2.00 (call). So the one-shot subgame answer is **OOP bets / IP folds** — a pure steal, since the board is a chop and there is no value in betting.\n\nNow lock IP to *always call* and ask the tool for OOP's exact best response. The steal's only profit source (fold equity) is gone, a called pot is -2.00 for OOP, so OOP's exact best response flips to **check** — and check-check is -0.05 for both. The test asserts exactly this: `solve_exact_response`\n\nreturns `{\"OOP_river\": \"check\"}`\n\nonce Hero is locked to call.\n\nAnd crucially, this clears the baseline: Hero's EV goes from -1.00 (the one-shot steal baseline) to -0.05 after OOP adapts — still negative, but strictly better than the baseline, which is exactly the `robustly_profitable`\n\ncondition. That is the whole point of the project stated in one example: **the one-shot subgame answer (bet/fold) is not the answer under the fixed commitment I wanted to test (check/check).** The commitment to call removes the opponent's only incentive to bet. (Whether this constitutes a repeated-game *equilibrium* is the stronger claim I am deliberately not making — this is a commitment-analysis result, not an equilibrium proof.)\n\nThe tool also puts a number on *how long* that commitment stays worth it. With baseline Hero EV = -1.00 (steal), pre-adaptation = -2.00 (locked call while OOP still bets), post-adaptation = -0.05 (OOP has switched to check), `T_deadline`\n\ncomes out as `floor(1 + 19N/39)`\n\n:\n\n```\n| N (horizon) | T_deadline |\n|------------:|-----------:|\n|          10 |          5 |\n|          20 |         10 |\n|          50 |         25 |\n|         100 |         49 |\n```\n\nThe honest caveat: this is a tiny, hand-built tree, and the EVs are ones I can check by hand — that is exactly why I trust *this* result more than anything else in the repo. It is not evidence about real games; it is evidence that the model and the code agree on one constructed example built to validate the effect.\n\nVerification on my machine:\n\n`python -m pytest tests/test_nuts_chop_steal_commitment.py -v`\n\n→ 15 passed`python -m pytest -q`\n\n→ 500 passed`python scripts/check_mvp.py`\n\n→ passes`git diff --check`\n\n→ cleanI supplied the algorithm and the poker model. Codex wrote the implementation instructions and reviewed the results; Claude Code wrote the code. I checked Codex's prompts and corrected wrong premises, but I did not review the code line by line — I relied on the Codex/Claude review loop and the test suite (currently 500 passing tests).\n\nTwo things from that process are worth recording:\n\nA note on terms the code keeps separate: `T_deadline`\n\nis economic (how late Villain can adapt while the locked policy still beats the baseline); `T_detect`\n\nis visibility (how many local observations before the candidate's action distribution looks distinguishable from baseline). They are different questions.\n\n**Best-response ties matter.** If Villain has several best responses with identical Villain EV, Hero's EV can still differ across them. Returning one arbitrary response would hide that risk, so the MVP reports both `ev_h_worst`\n\nand `ev_h_best`\n\nacross the tie set. (Verified: `BestResponseResult`\n\nexposes both and the action variation across optimal pure strategies.)\n\n**Small examples are not a weakness.** The nuts-chop river benchmark is tiny on purpose: easier to hand-check, harder to mistake for a real-money recommendation.\n\nThe main one: **the code has not had an independent human code review.** Tests pass, but I haven't read the implementation line by line and nobody else has either. Rather than rely on reading the code, I plan to validate it from the outside — design the verification to be as exhaustive as I can make it, run simulations across many configurations, and check that the results hold up. Whether static or property-based checking can give that coverage is something I'm still working out.\n\nThe narrower limits: it is not a full solver, does not import real solver ranges yet, does not solve large no-limit games, and does not do STT / ICM / preflop push-fold yet. The exact response engine enumerates Villain pure strategies, so it is meant for small abstract trees only — there is an explicit `max_pure_strategies`\n\nceiling, default 100,000. Candidate generation is simple: finite shifts from a baseline, not a continuous strategy space.\n\nMost importantly: positive EV *inside this model* does not guarantee profitable play. The model can be wrong if the abstraction, action tree, rake rule, ranges, or adaptation assumptions are wrong.\n\nThis is not gambling, bankroll, financial, or legal advice.\n\nThe toy game confirmed the effect, so next I want to extend the tool: analyze with hand ranges rather than abstract action probabilities, and model the opponent adapting gradually (e.g. a Bayesian update of their response over repetitions) instead of switching to an exact best response in one step. Alongside that, I want to firm up the outside-in verification described above before trusting results on new spots.\n\n**Disclosure:** I used AI assistance throughout this project and to draft this article. The division of labor was deliberate: I supplied the algorithm and the poker model, Codex handled instructions and review, and Claude Code wrote the code; I checked the prompts and relied on automated review and tests for the implementation. This article was also drafted with AI help and then rewritten to reflect my own decisions, mistakes, and open questions. Technical claims are marked where I have verified them against the code myself; where I say something is provisional or unreviewed, that is literally true.", "url": "https://wpnews.pro/news/why-i-built-a-tiny-repeated-game-poker-analysis-tool", "canonical_source": "https://dev.to/ty215/why-i-built-a-tiny-repeated-game-poker-analysis-tool-3joa", "published_at": "2026-06-27 00:23:00+00:00", "updated_at": "2026-06-27 00:33:55.235065+00:00", "lang": "en", "topics": ["developer-tools", "ai-research", "machine-learning"], "entities": ["repeated-poker-analysis", "Python", "T_deadline", "T_detect", "Hero", "Villain", "MVP"], "alternates": {"html": "https://wpnews.pro/news/why-i-built-a-tiny-repeated-game-poker-analysis-tool", "markdown": "https://wpnews.pro/news/why-i-built-a-tiny-repeated-game-poker-analysis-tool.md", "text": "https://wpnews.pro/news/why-i-built-a-tiny-repeated-game-poker-analysis-tool.txt", "jsonld": "https://wpnews.pro/news/why-i-built-a-tiny-repeated-game-poker-analysis-tool.jsonld"}}