{"slug": "romo-a-large-scale-richly-organized-dataset-and-semantic-taxonomy-for-human", "title": "RoMo: A Large-Scale, Richly Organized Dataset and Semantic Taxonomy for Human Motion Generation", "summary": "Researchers have released RoMo, a large-scale dataset of in-the-wild human motions curated to overcome the limitations of existing small, high-fidelity motion capture and low-quality collections. The dataset employs a taxonomy-aware filtering pipeline to remove static and artifact-prone sequences, with every motion annotated by a three-level semantic taxonomy for fine-grained evaluation. Models trained on RoMo achieve state-of-the-art fidelity and diversity, demonstrating superior understanding of complex text prompts, while the accompanying Motion Toolbox standardizes metrics and visualization for reproducible motion generation research.", "body_md": "arXiv:2605.26241v1 Announce Type: new\nAbstract: Success in generative modeling across language, image, and video demonstrates that large, well-curated datasets are the key driver for building capable models. 3D Human motion, however, has lagged behind, constrained by an unsatisfying choice between small, high-fidelity motion capture datasets and large-scale in-the-wild collections dominated by static or low-quality sequences. We introduce RoMo, a rich, large-scale, carefully curated dataset of in-the-wild human motions that resolves these tradeoffs. To ensure quality, we introduce a taxonomy-aware filtering pipeline that aggressively removes static and artifact-prone sequences. Every sequence is annotated with detailed captions and organized by a novel three-level semantic taxonomy. This hierarchical structure enables fine-grained, per-category evaluation, that reveals model strengths and weaknesses obscured by global metrics. We demonstrate that models trained on RoMo achieve state-of-the-art fidelity and diversity while gaining a superior understanding of complex, subtle text prompts. Finally, we release the Motion Toolbox to standardize metrics, data conversion, and visualization, establishing a foundation for reproducible and interpretable motion generation research.", "url": "https://wpnews.pro/news/romo-a-large-scale-richly-organized-dataset-and-semantic-taxonomy-for-human", "canonical_source": "https://arxiv.org/abs/2605.26241", "published_at": "2026-05-27 04:00:00+00:00", "updated_at": "2026-05-27 04:26:49.453049+00:00", "lang": "en", "topics": ["generative-ai", "computer-vision", "machine-learning", "artificial-intelligence", "ai-research"], "entities": ["RoMo", "Motion Toolbox", "arXiv"], "alternates": {"html": "https://wpnews.pro/news/romo-a-large-scale-richly-organized-dataset-and-semantic-taxonomy-for-human", "markdown": "https://wpnews.pro/news/romo-a-large-scale-richly-organized-dataset-and-semantic-taxonomy-for-human.md", "text": "https://wpnews.pro/news/romo-a-large-scale-richly-organized-dataset-and-semantic-taxonomy-for-human.txt", "jsonld": "https://wpnews.pro/news/romo-a-large-scale-richly-organized-dataset-and-semantic-taxonomy-for-human.jsonld"}}