Intend to Move: A Multimodal Dataset for
Intention-Aware Human Motion Understanding

🥳 NeurIPS 2025 Datasets and Benchmarks Track 🥳

1The University of Tokyo, 2Tokyo Denki University 3Nanyang Technological University 4RIKEN AIP

Intend to Move (I2M) is a new multimodal dataset for embodied AI, designed for intention-aware human motion understanding in real-world environments.

Abstract

Human motion is inherently intentional, yet most motion modeling paradigms remain confined to kinematic prediction, neglecting the semantic and causal structure that drives behavior. Existing datasets capture short, decontextualized actions in static settings, providing little grounding for embodied reasoning. We introduce Intend to Move (I2M), a large-scale, multimodal dataset for intention-aware embodied motion modeling. I2M contains 10.1 hours of two-person 3D motion sequences recorded in dynamic, realistic home environments, accompanied by synchronized multi-view RGB-D video, detailed 3D scene geometry, and timestamped language annotations of each participant’s evolving intentions. Benchmark experiments reveal a fundamental gap in current motion models: they fail to translate high-level goals into physically and socially coherent motion. I2M thus serves not only as a dataset but as a benchmark for embodied intelligence, fostering models that can reason about, predict, and act upon the ``why'' behind human motion.

Intend to Move Dataset Teaser