# Why I Built a Tiny Repeated-Game Poker Analysis Tool

> Source: <https://dev.to/ty215/why-i-built-a-tiny-repeated-game-poker-analysis-tool-3joa>
> Published: 2026-06-27 00:23:00+00:00

Most poker solvers answer one question very well: given a single hand and a single decision tree, what is the equilibrium strategy? (Yes, there is subgame solving, node locking, and plenty more — but the default frame is still one hand, one equilibrium.)

I kept getting stuck on a different one. What if the *same kind* of spot shows up over and over, and a player can commit to a fixed strategy across those repetitions? In a few toy games I had a hunch, worked out by hand, that committing to a fixed strategy could change its value relative to the one-shot picture. I wanted a tool that could make that commitment value precise — to actually *analyze* it rather than just believe it. (Whether any of this rises to a repeated-game equilibrium is a much stronger claim, and one I am deliberately not making here.)

I'm still learning software engineering, so until recently I couldn't implement this — I was stuck reasoning about toy games on paper. AI tooling made the analysis feasible, so I finally started building it: `repeated-poker-analysis`

.

It's a small research project: write one narrow model down, run small examples, and record what the model does and doesn't justify.

`repeated-poker-analysis`

is
It is an experimental Python toolkit for small abstract poker games. The current MVP covers:

`T_deadline`

, an economic adaptation deadline,`T_detect`

, an observable-distribution sensitivity estimate,It is small on purpose. It is not a full solver and it is not wired to real solver ranges. It starts from one toy game — a river spot — that is tiny enough to inspect and test by hand.

That toy spot is one where showdown always chops but rake still bites. In a single-hand view, putting more money into a raked pot can be locally unattractive. Across repeated occurrences the same spot raises a commitment question: if one player refuses to fold in a fixed pattern, how does the other respond, and how fast would that response have to come for the commitment to stop being worth it?

This is the question I wanted a tool to make precise — not a claim that any new equilibrium exists.

Repeated games sound like a natural home for reputation, punishment, and adaptation, and poker has obvious repeated structure: similar river spots, similar blind-vs-blind situations, similar sizings, similar pools.

Here is the trap I had to respect. If the number of repetitions is known, the game is fully observed, each spot is independent, and both players are perfectly rational, then a finite repeated game often collapses back toward the one-shot equilibrium by backward induction. "This spot happens five times" is *not* by itself enough to claim a reputation equilibrium. That is the standard game-theory result, and it is the reason the project keeps the layers below separate.

So the project keeps several ideas apart that are easy to blur:

The MVP mostly lives in the commitment-analysis layer: if Hero is fixed to a candidate strategy in the supplied tree, what are Villain's exact best responses, and what happens to Hero EV under conservative tie handling?

*(This describes the MVP on main at the time of writing. I'm still changing it, so details may move.)*

It runs an end-to-end candidate-analysis pipeline on a small abstract game:

`T_deadline`

and local `T_detect`

,In plain terms, the analysis loop is:

`robustly_profitable`

only when that post-response worst-case Hero EV is strictly higher than Hero's EV in the baseline profile. The point is not "positive EV" — it is "still better than the one-shot baseline even after the opponent best-responds."`T_deadline`

/ `T_detect`

then add repeated-game timing on top of the candidates that survive.The main entry point is `run_candidate_analysis_pipeline`

.

```
python scripts/check_mvp.py
```

A simplified workflow:

``` python
from nuts_chop_river import build_nuts_chop_river, default_hero_strategy
from candidate_library import baseline_villain_strategy

from repeated_poker import (
    CandidateFilterConfig,
    CandidateGenerationConfig,
    run_candidate_analysis_pipeline,
)

tree = build_nuts_chop_river()
baseline_hero = default_hero_strategy()
baseline_villain = baseline_villain_strategy()

result = run_candidate_analysis_pipeline(
    tree,
    baseline_hero,
    baseline_villain,
    generation=CandidateGenerationConfig(shift_amounts=[0.1, 0.2]),
    horizon=5,
    profit_tolerance=-2.0,
    max_selection_l1_distance=0.3,
    detection_log_likelihood_threshold=3.0,
    detection_occurrence_probability_per_opportunity=0.5,
    filtering=CandidateFilterConfig(
        max_l1_distance=0.3,
        min_required_observations=5,
    ),
)

print(result.markdown_summary)
```

The output is a diagnostic report for the model you supplied, not a poker recommendation. Here is an excerpt from the actual output of `examples/analysis_pipeline.py`

on the nuts-chop river toy game (I've trimmed the Configurations block and some columns; 8 candidates generated, 6 dropped by the filter, 2 compared):

```
generated=8 kept=2 excluded=6
compared=2

## Candidate Analysis Summary

### Summary Counts
- total: 2
- eligible: 2
- excluded: 0
- minimum_villain_ev: 1
- pareto_frontier: 2

### Candidate Rows

| candidate_id            | fixed_hero_ev | post_response_hero_ev_worst | robustly_profitable | t_detect | exclusion_reasons |
| ----------------------- | ------------- | --------------------------- | ------------------- | -------- | ----------------- |
| H1\|check->bet\|shift=0.1 | 0.625         | -0.850                      | no                  | 278      | -                 |
| H1\|bet->check\|shift=0.1 | 0.275         | -0.750                      | no                  | 294      | -                 |
```

The baseline Hero EV in this run is +0.45. The column that matters is `robustly_profitable`

: it is `yes`

only when `post_response_hero_ev_worst`

exceeds that baseline. Here both candidates are `no`

(-0.85 and -0.75 are below +0.45). A candidate that clears the baseline is rare and can exist in constructed cases — the tool's job is to search the candidates and find it when it does. The next section is a hand-built spot where one does.

I needed at least one example where the machinery clearly does what it is meant to: a known spot where committing to a fixed strategy leaves Hero better off than the one-shot baseline, *even after* the opponent best-responds. This nuts-chop steal is that example, and I wrote a dedicated test for it ([ tests/test_nuts_chop_steal_commitment.py](https://github.com/guriguri215-lang/repeated-poker-analysis/blob/main/tests/test_nuts_chop_steal_commitment.py)). Treat it as a check that the tool can detect the effect at all — not as the end goal, and not as a claim about real games. Outside this constructed spot I do not know which situations, if any, are profitable to commit in.

The spot: a river where the board is already the nuts, so every showdown chops. There is no value betting — the only reason to bet (shove) is fold equity. Rake is below its cap, so a *called* pot just bleeds chips to the house. With a small starting pot and a big shove, a single hand looks like this:

```
initial commitment = 1, initial pot = 2, bet = 98, rake = 5%, cap = 4

| Line         | Hero/IP EV | Villain/OOP EV |
|--------------|-----------:|---------------:|
| check-check  |      -0.05 |          -0.05 |
| bet-fold     |      -1.00 |          +1.00 |
| bet-call     |      -2.00 |          -2.00 |
```

In one hand the caller folds: -1.00 (fold) beats -2.00 (call). So the one-shot subgame answer is **OOP bets / IP folds** — a pure steal, since the board is a chop and there is no value in betting.

Now lock IP to *always call* and ask the tool for OOP's exact best response. The steal's only profit source (fold equity) is gone, a called pot is -2.00 for OOP, so OOP's exact best response flips to **check** — and check-check is -0.05 for both. The test asserts exactly this: `solve_exact_response`

returns `{"OOP_river": "check"}`

once Hero is locked to call.

And crucially, this clears the baseline: Hero's EV goes from -1.00 (the one-shot steal baseline) to -0.05 after OOP adapts — still negative, but strictly better than the baseline, which is exactly the `robustly_profitable`

condition. That is the whole point of the project stated in one example: **the one-shot subgame answer (bet/fold) is not the answer under the fixed commitment I wanted to test (check/check).** The commitment to call removes the opponent's only incentive to bet. (Whether this constitutes a repeated-game *equilibrium* is the stronger claim I am deliberately not making — this is a commitment-analysis result, not an equilibrium proof.)

The tool also puts a number on *how long* that commitment stays worth it. With baseline Hero EV = -1.00 (steal), pre-adaptation = -2.00 (locked call while OOP still bets), post-adaptation = -0.05 (OOP has switched to check), `T_deadline`

comes out as `floor(1 + 19N/39)`

:

```
| N (horizon) | T_deadline |
|------------:|-----------:|
|          10 |          5 |
|          20 |         10 |
|          50 |         25 |
|         100 |         49 |
```

The honest caveat: this is a tiny, hand-built tree, and the EVs are ones I can check by hand — that is exactly why I trust *this* result more than anything else in the repo. It is not evidence about real games; it is evidence that the model and the code agree on one constructed example built to validate the effect.

Verification on my machine:

`python -m pytest tests/test_nuts_chop_steal_commitment.py -v`

→ 15 passed`python -m pytest -q`

→ 500 passed`python scripts/check_mvp.py`

→ passes`git diff --check`

→ cleanI supplied the algorithm and the poker model. Codex wrote the implementation instructions and reviewed the results; Claude Code wrote the code. I checked Codex's prompts and corrected wrong premises, but I did not review the code line by line — I relied on the Codex/Claude review loop and the test suite (currently 500 passing tests).

Two things from that process are worth recording:

A note on terms the code keeps separate: `T_deadline`

is economic (how late Villain can adapt while the locked policy still beats the baseline); `T_detect`

is visibility (how many local observations before the candidate's action distribution looks distinguishable from baseline). They are different questions.

**Best-response ties matter.** If Villain has several best responses with identical Villain EV, Hero's EV can still differ across them. Returning one arbitrary response would hide that risk, so the MVP reports both `ev_h_worst`

and `ev_h_best`

across the tie set. (Verified: `BestResponseResult`

exposes both and the action variation across optimal pure strategies.)

**Small examples are not a weakness.** The nuts-chop river benchmark is tiny on purpose: easier to hand-check, harder to mistake for a real-money recommendation.

The main one: **the code has not had an independent human code review.** Tests pass, but I haven't read the implementation line by line and nobody else has either. Rather than rely on reading the code, I plan to validate it from the outside — design the verification to be as exhaustive as I can make it, run simulations across many configurations, and check that the results hold up. Whether static or property-based checking can give that coverage is something I'm still working out.

The narrower limits: it is not a full solver, does not import real solver ranges yet, does not solve large no-limit games, and does not do STT / ICM / preflop push-fold yet. The exact response engine enumerates Villain pure strategies, so it is meant for small abstract trees only — there is an explicit `max_pure_strategies`

ceiling, default 100,000. Candidate generation is simple: finite shifts from a baseline, not a continuous strategy space.

Most importantly: positive EV *inside this model* does not guarantee profitable play. The model can be wrong if the abstraction, action tree, rake rule, ranges, or adaptation assumptions are wrong.

This is not gambling, bankroll, financial, or legal advice.

The toy game confirmed the effect, so next I want to extend the tool: analyze with hand ranges rather than abstract action probabilities, and model the opponent adapting gradually (e.g. a Bayesian update of their response over repetitions) instead of switching to an exact best response in one step. Alongside that, I want to firm up the outside-in verification described above before trusting results on new spots.

**Disclosure:** I used AI assistance throughout this project and to draft this article. The division of labor was deliberate: I supplied the algorithm and the poker model, Codex handled instructions and review, and Claude Code wrote the code; I checked the prompts and relied on automated review and tests for the implementation. This article was also drafted with AI help and then rewritten to reflect my own decisions, mistakes, and open questions. Technical claims are marked where I have verified them against the code myself; where I say something is provisional or unreviewed, that is literally true.
