cd /news/artificial-intelligence/reka-turns-counter-strike-2-demos-in… · home topics artificial-intelligence article
[ARTICLE · art-40122] src=runtimewire.com ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

Reka turns Counter-Strike 2 demos into a world-model training dataset

Reka released CS2-10k, a large-scale dataset and rendering pipeline built from Counter-Strike 2 professional match demos, providing over 10,000 hours of egocentric video with per-frame action annotations to advance world-model research. The dataset, still being uploaded, includes synchronized keyboard, mouse, and 3D position data, targeting applications in action-conditioned video generation and embodied AI.

read4 min views1 publishedJun 26, 2026
Reka turns Counter-Strike 2 demos into a world-model training dataset
Image: Runtimewire (auto-discovered)

Reka (@RekaAILabs) has released CS2-10k, a Counter-Strike 2 dataset and rendering pipeline aimed at one of the harder bottlenecks in world-model research: pairing egocentric video with the exact actions that produced each frame.

https://x.com/RekaAILabs/status/2070245465937822007 The release is a concrete signal of where Dani Yogatama's team is steering Reka. The Sunnyvale AI lab emerged from stealth in 2023 after being founded by researchers from DeepMind, Google, Baidu and Meta, with Yogatama, Cyprien de Masson d'Autume, Qi Liu Head and Yi Tay building the company around multimodal models for enterprise use cases, according to TechCrunch's 2023 profile. Reka's own homepage now frames its work more broadly as models and infrastructure for the physical AI era, including data infrastructure for egocentric video, robotics trajectories, world-model footage and expert judgment through Claru.

In a two-post thread on X, Reka said training world models requires synchronized egocentric video and dense action signals, and that such data is hard to find. CS2-10k is Reka's answer: a dataset built from Counter-Strike 2 professional match demos, with the company claiming more than 600,000 player-round videos, more than 10,000 hours of first-person footage and per-frame annotations covering keyboard state, mouse delta and 3D position.

The important qualifier is availability. The Hugging Face dataset card says the full dataset is still being uploaded and will appear over the coming days. As of June 25, 2026, the public Hugging Face page showed a browsable sample subset of three full matches, 748 rows and a 25.9 GB total file size. That makes CS2-10k an announced large-scale dataset with a live sample and open tooling, not yet a fully present 10,000-hour corpus on Hugging Face.

What Reka has made available matters beyond the headline size. The CS2-10k blog post says each clip is rendered at 720p and 48 fps from a single player's first-person perspective, with a matching parquet file aligned to the video timeline. The annotations include map, round number, team, frame count, field of view, active movement and action keys, mouse movement proxies, world position and camera yaw and pitch.

That alignment is the point. A model trained only on video can learn visual dynamics. A model trained on video plus action signals can be asked a more useful question: given what the agent sees and the action it takes next, what should happen visually and spatially? Reka positions CS2-10k for action-conditioned video generation, egocentric navigation, long-horizon planning and multi-agent world modeling, where the same round can be viewed from multiple players' perspectives with shared map and round identifiers.

The Counter-Strike choice is pragmatic. Real-world embodied data is expensive to collect, especially when the researcher needs synchronized camera, controls and state. Pure synthetic data is easier to label but can lack behavioral variety. Counter-Strike 2 demos sit between those options: public professional match replays preserve human behavior, while the game's deterministic replay tooling lets Reka reconstruct clean first-person footage and recover the controls and player state behind it.

Reka is also releasing the cs2-dem-renderer GitHub repository, the pipeline it says it used to create the dataset. The repo describes a Linux-oriented renderer that converts Counter-Strike 2 .dem

demo files into per-player-round videos with synchronized frame-level metadata. It parses player spawn and death intervals plus per-frame button inputs, launches Counter-Strike 2 through Steam with the demo loaded, streams raw frames to ffmpeg and writes .mp4

clips alongside parquet metadata.

That open pipeline is strategically useful for Reka. If researchers accept Counter-Strike as a useful substrate for embodied AI, Reka does not need to own every match, clip or annotation variant itself. By releasing the renderer, Reka gives labs a way to expand beyond the sample and proposed full corpus, while still anchoring the workflow around Reka's schema and tooling.

The licensing also narrows the use case. The Hugging Face card lists CS2-10k under CC BY-NC 4.0, with attribution and non-commercial restrictions, and notes that the underlying match demos remain the property of their respective rights holders. That is appropriate for research distribution, but it limits straightforward commercial use by companies training production systems.

CS2-10k also lands at a moment when AI labs are looking for cheaper routes into embodied and interactive data. Large language model training rewarded internet-scale text and code. World models need time, action and consequence. Reka's bet is that a tactical shooter replay can supply enough visual richness, human behavior and recoverable control signals to become useful infrastructure for that next training regime.

For a company that began by selling enterprise-grade multimodal assistants and custom models, the release reads less like a one-off dataset drop than a product breadcrumb. Reka is no longer presenting itself only as a model builder. It is trying to own part of the data and tooling layer beneath physical AI, where the scarce asset is not another benchmark score but synchronized experience at scale.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @reka 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/reka-turns-counter-s…] indexed:0 read:4min 2026-06-26 ·