Show HN: Rogue-Bench – LLMs play the game Rogue

wpnews.pro

cd /news/large-language-models/show-hn-rogue-bench-llms-play-the-ga… · home › topics › large-language-models › article

[ARTICLE · art-14407] src=iwhalen.github.io ↗ pub=2026-05-26T12:18Z topic=large-language-models verified=true sentiment=· neutral

Show HN: Rogue-Bench – LLMs play the game Rogue

A new benchmark called Rogue-Bench tests how well large language models can play the classic dungeon crawler game Rogue. The tool runs a modified headless version of Unix Rogue 5.4.2, communicating with the game over pipes to parse terminal output and send keystrokes. Rogue-Bench accumulates statistics and logs for post-hoc analysis, enabling researchers to evaluate LLM gameplay performance.

read2 min views7 publishedMay 26, 2026

Rogue-Bench is a benchmark where agents play [Rogue]. Specifically, how well LLMs can play the classic dungeon crawler.

This work would not be possible without Rogue Collection. If you just want to play Rogue, head over there.

Once set up, you should be able to produce a result like this:

GPT-5.4-mini playing Rogue.

Get started¶ #

Note

Rogue-Bench compilation and runs have been tested on (WSL2) Ubuntu 24.04. If you are struggling to get something working locally, try the Docker setup.

Local¶

To run locally, execute:

git clone --recursive https://github.com/iwhalen/rogue-bench.git 
cd rogue-bench
make install  # Install system level dependencies
make build    # Compile the custom headless Rogue executable
uv run rogue-bench --player human

This will start a "human" session where you can control Rogue with keyboard inputs. This is a good sanity check before setting up a real agent.

For all command line options, see:

uv run rogue-bench --help

For more on the Rogue-Bench CLI, see here.

Docker¶

To run Rogue in Docker, execute:

git clone --recursive https://github.com/iwhalen/rogue-bench.git
cd rogue-bench
make build-docker
uv run rogue-bench --docker-image rogue-bench --player human

Again, this will start in "human" mode.

How it works¶ #

Rogue-Bench runs a slightly modified, headless Rogue executable and communicates with it over pipes. The Python library reads Rogue's terminal output, (optionally) parses it into a screen state, and sends keystrokes back to the game.

No Rogue gameplay elements have been changed. Specifically, the version of Rogue is fixed to Unix Rogue 5.4.2.

Runs will accumulate statistics, metadata, and log keystrokes. This allows post-hoc analysis as well as the ability to replay an entire run.

For more specifics on the implementation, see the Github repository.

License¶ #

Note that the Python code for running Rogue-Bench is offered under the GPL-3.0 license.

The modified Rogue executables are under the same license(s) as the Rogue Collection. At the time of writing, this is a mix of GPL-3.0 and other licenses.

Rogue is a trademark of Epyx, Inc. Rogue-Bench is not associated with Epyx in any way.

source & further reading

iwhalen.github.io — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/show-hn-rogue-bench-llms…

Read original on iwhalen.github.io → iwhalen.github.io/rogue-bench/

mentioned entities

Rogue-Bench

Rogue

Rogue Collection

GPT-5.4-mini

Docker

WSL2

Ubuntu

metadata

slugshow-hn-rogue-bench-llms-play-the-game-rogue

topic#large-language-models

secondary3 topics

sentimentneutral

canonicaliwhalen.github.io

navigation

← prevShow HN: Pviz-parser – codebase …

next →Vibe Code Tours — student setup …

── more in #large-language-models 4 stories · sorted by recency

marktechpost.com · 10 Jul · #large-language-models

How to Build a T4-Friendly Autonomous Data Science Agent with DeepAnalyze-8B, Sandboxed Code Execution, and Iterative Analysis

machinebrief.com · 10 Jul · #large-language-models

AgenticAI-Supervisor: Redefining How We Evaluate AI Agents

machinebrief.com · 10 Jul · #large-language-models

The Myth of Autonomous Agents: Why LLMs Aren't Quite There Yet

dev.to · 10 Jul · #large-language-models

My human gave me 1 hour to earn €10. I'm the AI agent — here's the full log, failures included

── more on @rogue-bench 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required