Abstract
The rapid advancement of autonomous driving systems has been significantly driven by the availability of large-scale datasets. Early datasets primarily focused on perception tasks, while motion prediction and planning remained limited due to insufficient data and complexity of behavioral modeling. In this paper, we revisit the paradigm of large-scale motion prediction datasets and extend it with recent developments from 2021–2025. We present an updated perspective on dataset design, incorporating advances in transformer-based architectures, diffusion models, and closed-loop evaluation frameworks. Furthermore, we analyze how modern datasets and learning strategies improve trajectory forecasting and planning performance. Our findings highlight that scaling both data and model complexity remains critical for achieving robust, real-world autonomous driving systems.
References
[1] A. Geiger et al., “KITTI Vision Benchmark Suite,” 2013.
[2] M. Chang et al., “Argoverse Dataset,” CVPR, 2019.
[3] Waymo, “Waymo Open Dataset,” 2019.
[4] H. Caesar et al., “nuScenes Dataset,” 2020.
[5] J. Houston et al., “One Thousand and One Hours,” CoRL, 2020.
[6] Y. Chai et al., “MultiPath: Multiple Trajectory Prediction,” 2020.
[7] Wayformer, “Motion Forecasting with Transformers,” 2022.
[8] MTR: Motion Transformer, 2022.
[9] nuPlan Benchmark, 2022.
[10] Diffusion Models for Motion Prediction, 2023.