GPU Survivors: Can You Survive a 1T Parameter Inference Run? A developer built an interactive 2D retro action-roguelike game called GPU Survivors that simulates the architectural limits, failure modes, and optimization hyperparameters of running a Large Language Model under load. Players control a GPU core surviving waves of training loads while scaling to 1 trillion parameters, with in-game mechanics mapping to real-world LLM concepts like context windows, activations, and adversarial attacks. Ever wondered what a GPU goes through during a massive language model inference run? While you type a query and wait for tokens, the silicon under the hood is holding together a fragile house of cards: balancing context window limits, scheduling activations, managing weights, and evading malicious adversarial attacks. To teach you how LLMs behave and fall apart under load, I built an interactive game: Play in Fullscreen Mode if the embed sizing is tight https://llms-are-demented-166926259124.us-central1.run.app/gpu-survivors/ Before initiating your run, choose your difficulty configuration each represented by a unique retro pixel chip sprite and custom parameters : 2.8 , boosted damage, and a wide collection window. You get +25% XP gains and start with both the Attention Beam and the Softmax Aura active. 2.5 , standard damage, and standard 100% XP gains. Starts with the Attention Beam active. 2.1 , reduced damage, and a -20% XP penalty. Starts with a single Attention head active.This isn't just a homage to Vampire Survivors—every upgrade, weapon, and enemy represents a real-world concept in modern machine learning. Here is how the in-game mechanics map directly to how Large Language Models operate, fail, and optimize in production: At exactly 15:00 , all standard enemies are swept away, and the unkillable red boss Hardware Degradation arrives. You cannot harm it. Can you survive a 1T parameter inference run? Welcome to GPU Survivors , an interactive 2D retro action-roguelike built to simulate the architectural limits, failure modes, and optimization hyperparameters of running a Large Language Model under load. In the digital deep, bad data and chaotic vectors threaten inference stability. You are a GPU Core initializing a new language model. Survive the endless incoming waves of training loads OOD outliers, prompt injections, and data biases , gather FLOPs XP , and scale your architecture to 1T parameters WASD or Arrow Keys . Escape or P to pause the run, resume, or exit.Select your inference endpoint difficulty at startup: Disclaimer: AI was used throughout this project, it is just fitting that it would co-author with me, so special thanks to the Foundry for its tireless hours toiling away and Gemini for producing the cover image.