cd /news/artificial-intelligence/self-play-reinforcement-learning-und… · home topics artificial-intelligence article
[ARTICLE · art-17129] src=arxiv.org pub= topic=artificial-intelligence verified=true sentiment=· neutral

Self-Play Reinforcement Learning under Imperfect Information in Big 2

Researchers developed a self-play reinforcement learning framework for the four-player imperfect-information card game Big 2, enabling controlled comparisons of different RL agents. Under standardized conditions, Proximal Policy Optimization (PPO) outperformed Monte Carlo Q approximation, SARSA, and Q-learning against random, greedy, and heuristic opponents. The study found that moderate entropy regularization and current-policy self-play improved PPO's performance, establishing Big 2 as a useful benchmark for studying deep RL under hidden information, multiplayer dynamics, and delayed rewards.

read1 min publishedMay 29, 2026

arXiv:2605.28863v1 Announce Type: new Abstract: Imperfect-information multiplayer games test whether agents can act under hidden information, sparse rewards, and non-stationary opponents. We study these challenges in Big 2, a four-player imperfect-information card game. We develop a self-play RL framework for Big 2 that enables controlled comparisons between policy-gradient and value-approximating agents. Under a common environment, input representation, training budget, and evaluation protocol, PPO outperforms Monte Carlo Q approximation, SARSA, and Q-learning against random, greedy, and heuristic Big 2 opponents. We further find that moderate entropy regularization improves PPO by preventing the policy from becoming overly deterministic, and that current-policy self-play provides a stronger finite-budget curriculum than checkpoint self-play or fixed-opponent training. Together, these results show that Big 2 is a useful controlled setting for studying deep RL under imperfect information, multiplayer interaction, delayed rewards, and variable action sets.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/self-play-reinforcem…] indexed:0 read:1min 2026-05-29 ·