CS2-10k: A Large-Scale Egocentric Counter-Strike 2 Dataset

wpnews.pro

cd /news/artificial-intelligence/cs2-10k-a-large-scale-egocentric-cou… · home › topics › artificial-intelligence › article

[ARTICLE · art-40134] src=reka.ai ↗ pub=2026-06-26T00:15Z topic=artificial-intelligence verified=true sentiment=↑ positive

CS2-10k: A Large-Scale Egocentric Counter-Strike 2 Dataset

Reka AI released CS2-10k, a large-scale egocentric dataset built from professional Counter-Strike 2 matches, containing over 600,000 player-round videos totaling 10,000+ hours of first-person footage with per-frame annotations of keyboard state, mouse movement, and 3D player trajectory. The dataset aims to support training interactive world models for embodied AI research, and the open-source rendering pipeline used to create it is also being released.

read3 min views2 publishedJun 26, 2026

CS2-10k: A Large-Scale Egocentric Counter-Strike 2 Dataset — Image: source

Training interactive world models requires data that is notoriously hard to find: ego-centric video sequences with densely aligned action signals (keyboard inputs, camera motion, and ego state) all synchronized to the visual stream.

Real-world embodied data is costly to collect, while synthetic data often lacks the visual richness or behavioral diversity needed for generalization. Counter-Strike 2 demos offer a compelling middle ground: because matches are recorded as deterministic replays, we can reconstruct clean first-person video at any point in a match, extracting the precise control inputs that drove each visual change. For these reasons, Counter-Strike is fast becoming a popular substrate for embodied AI and world-model research, with recent efforts such as EgoCS-400k reflecting a growing community interest in it as a rich source of egocentric training data.

Today we release CS2-10k, a large-scale egocentric gameplay dataset built from professional CS2 matches. It contains 600,000+ player-round videos spanning 10,000+ hours of first-person footage, paired with per-frame annotations covering keyboard state, mouse movement, and 3D player trajectory. Alongside this ready-to-use dataset, we are also releasing the ready-to-extend cs2-dem-renderer, the open-source pipeline used to produce it. All of this, so we can build better world models, together.

Dataset Overview #

CS2-10k is built from public professional match demos sourced from HLTV. For each demo, we render clean first-person video at 720p, 48fps using the demo replay tool inside CS2, producing one video per player per round. Alongside each video, we store a parquet file containing per-frame annotations synchronized to the video timeline.

Annotation Schema

Every video clip has its corresponding anotations stored in a .parquet

file:

Field	Type	Description
string	Map name (e.g. "mirage", "dust2")
int	Round within the match
int	0 = Counter-Terrorist, 1 = Terrorist
int	Total frames in the clip
float	Video frame rate (48.0)
float	Clip duration in seconds
float	Camera field of view (90.0°)

| list[dict] | Per-frame annotation array (see below) |

Per-Frame Annotations

Each entry in frame_data

contains:

Field	Description
Concatenated active keys:
Horizontal camera delta — proxy for mouse X movement
Vertical camera delta — proxy for mouse Y movement
Player world position in game units

| Camera yaw angle (−180° to 180°) |
| Camera pitch angle (−90° to 90°) |

The combination of video and per-frame control signals creates a tight action-observation loop.

No Abrupt Visual Changes #

Each clip is a contiguous segment of a single round from a single player's perspective. There are no mid-round cuts, no editing transitions, and no UI HUD. The camera moves in a physically plausible relationship in the world and we hide the player weapon to get rid of sudden visual changes caused by weapon recoil, reloads, and weapon switching.

Many Use Cases #

CS2-10k is designed for training interactive world models that learn how first-person visual observations change in response to player actions. The same aligned video, control, and state signals also support a range of related research workflows:

Rendering Pipeline #

If CS2-10k does not cover the scale, matches, or annotations you need, you can use our open-source pipeline at github.com/reka-ai/cs2-dem-renderer to render your own CS2 datasets. Given a .dem file, it performs a two-pass parse to extract per-player spawn/death intervals and per-frame button inputs, then drives CS2's built-in demo replay system to render first-person video for each player each round. Frames are streamed in real time from CS2's movie output to ffmpeg (VAAPI HEVC), producing .mp4

clips alongside synchronized .parquet

annotation files. A worker mode processes entire directories of demos with automatic deduplication, making it straightforward to run at the scale of CS2-10k.

Citation #

If you use CS2-10k in your work, please cite:

source & further reading

reka.ai — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/cs2-10k-a-large-scale-eg…

Read original on reka.ai → reka.ai/news/cs2-10k-a-large-scale-egocentric-co…

mentioned entities

Reka AI

Counter-Strike 2

CS2-10k

HLTV

EgoCS-400k

cs2-dem-renderer

metadata

slugcs2-10k-a-large-scale-egocentric-counter-strike-2-dataset

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalreka.ai

navigation

← prevOpenCode v1.17: MCP Resources, O…

next →Epic Games CEO Calls Steam AI Di…

── more in #artificial-intelligence 4 stories · sorted by recency

runtimewire.com · 26 Jun · #artificial-intelligence

Reka turns Counter-Strike 2 demos into a world-model training dataset

psychologytoday.com · 26 Jun · #artificial-intelligence

A Study of 26,000 Students Shows the AI Learning Trap

runtimewire.com · 25 Jun · #artificial-intelligence

Head to head: Bytedance Seedance V1.5 Pro Image To Video vs Happy Horse

devclubhouse.com · 25 Jun · #artificial-intelligence

Physics as Code: Inside Un-0's Oscillator-Based Image Generation

── more on @reka ai 3 stories trending now

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required