{"slug": "show-hn-rogue-bench-llms-play-the-game-rogue", "title": "Show HN: Rogue-Bench – LLMs play the game Rogue", "summary": "A new benchmark called Rogue-Bench tests how well large language models can play the classic dungeon crawler game Rogue. The tool runs a modified headless version of Unix Rogue 5.4.2, communicating with the game over pipes to parse terminal output and send keystrokes. Rogue-Bench accumulates statistics and logs for post-hoc analysis, enabling researchers to evaluate LLM gameplay performance.", "body_md": "# Rogue-Bench\n\nRogue-Bench is a benchmark where agents play [Rogue]. Specifically, how well LLMs can play the classic dungeon crawler.\n\nThis work would not be possible without [Rogue Collection](https://github.com/mikeyk730/Rogue-Collection). If you just want to play Rogue, head over there.\n\nOnce set up, you should be able to produce a result like this:\n\n*GPT-5.4-mini playing Rogue.*\n\n## Get started[¶](#get-started)\n\nNote\n\nRogue-Bench compilation and runs have been tested on (WSL2) Ubuntu 24.04. If you are struggling to get something working locally, try the Docker setup.\n\n### Local[¶](#local)\n\nTo run locally, execute:\n\n```\ngit clone --recursive https://github.com/iwhalen/rogue-bench.git \ncd rogue-bench\nmake install  # Install system level dependencies\nmake build    # Compile the custom headless Rogue executable\nuv run rogue-bench --player human\n```\n\nThis will start a \"human\" session where you can control Rogue with keyboard inputs. This is a good sanity check before setting up a real agent.\n\nFor all command line options, see:\n\n```\nuv run rogue-bench --help\n```\n\nFor more on the Rogue-Bench CLI, see [here](cli/).\n\n### Docker[¶](#docker)\n\nTo run Rogue in Docker, execute:\n\n```\ngit clone --recursive https://github.com/iwhalen/rogue-bench.git\ncd rogue-bench\nmake build-docker\nuv run rogue-bench --docker-image rogue-bench --player human\n```\n\nAgain, this will start in \"human\" mode.\n\n## How it works[¶](#how-it-works)\n\nRogue-Bench runs a slightly modified, headless Rogue executable and communicates with it over pipes. The Python library reads Rogue's terminal output, (optionally) parses it into a screen state, and sends keystrokes back to the game.\n\nNo Rogue gameplay elements have been changed. Specifically, the version of Rogue is fixed to Unix Rogue 5.4.2.\n\nRuns will accumulate statistics, metadata, and log keystrokes. This allows post-hoc analysis as well as the ability to replay an entire run.\n\nFor more specifics on the implementation, see the [Github repository](https://github.com/iwhalen/rogue-bench).\n\n## License[¶](#license)\n\nNote that the Python code for running Rogue-Bench is offered under the GPL-3.0 license.\n\nThe modified Rogue executables are under the same license(s) as the [Rogue Collection](https://github.com/mikeyk730/Rogue-Collection). At the time of writing, this is a mix of GPL-3.0 and other licenses.\n\nRogue is a trademark of Epyx, Inc. Rogue-Bench is not associated with Epyx in any way.", "url": "https://wpnews.pro/news/show-hn-rogue-bench-llms-play-the-game-rogue", "canonical_source": "https://iwhalen.github.io/rogue-bench/", "published_at": "2026-05-26 12:18:58+00:00", "updated_at": "2026-05-26 12:38:11.793643+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "ai-agents", "ai-research"], "entities": ["Rogue-Bench", "Rogue", "Rogue Collection", "GPT-5.4-mini", "Docker", "WSL2", "Ubuntu"], "alternates": {"html": "https://wpnews.pro/news/show-hn-rogue-bench-llms-play-the-game-rogue", "markdown": "https://wpnews.pro/news/show-hn-rogue-bench-llms-play-the-game-rogue.md", "text": "https://wpnews.pro/news/show-hn-rogue-bench-llms-play-the-game-rogue.txt", "jsonld": "https://wpnews.pro/news/show-hn-rogue-bench-llms-play-the-game-rogue.jsonld"}}