| 9:00 - 9:15 |
Organizers |
Welcome |
- |
| 9:15 - 9:45 |
Katherine Driggs-Campbell |
On the Limitations of Prediction for Interaction and Collaborations (and some solutions) |
View Abstract
Robots are becoming prevalent in our everyday lives and are changing the foundations of our way of life. However, the desirable impacts of robots are only achievable if the underlying algorithms can understand and predict human behaviors and trajectories. This research area has been growing in popularity, but how much of it is actually useful for robot planning and collaboration? In this talk, we will discuss some of the limitations (or unnecessary) aspects of prediction algorithms (e.g., partial observations, distribution assumptions). We'll present some solutions using graph representations for long-term motion prediction, and present work on multi-level prediction that is useful for long-term interaction between humans and robots. We'll look at the impact in crowd navigation, collaborative assembly, and repeated interactions in driving. Through our experiments, we found that focusing human models to the task results in effective human-robot interaction that is safe, interactive, and efficient.
|
| 9:45 - 10:15 |
Zhi Yan |
From Perception to Navigation: The Sandbox of Long-Term Human Motion Prediction |
View Abstract
Long-term human motion prediction (LHMP) is one of the key technologies for robots to operate in dynamic, human-centered environments. However, its success largely depends on the robustness of upstream perception and the efficiency of downstream navigation. This talk aims to provide a perspective on LHMP by connecting the different stages of the mobile robot autonomy process. We will first explore the challenges of upstream perception, which uses 3D LiDAR data for human detection and tracking. Then, we will discuss downstream deployment, namely how to enable robots to navigate in a way that is both in line with human habits and respects social norms. By anchoring LHMP between perception and navigation, we will elaborate the systemic and integration requirements for long-term robot autonomy in shared spaces with humans.
|
| 10:15 - 10:45 |
Jean Oh |
Predicting under the curse of rarity: self-driving to aviation
|
View Abstract
TBD
|
| 10:45 - 11:00 |
Coffee Break |
- |
- |
| 11:00 - 12:00 |
Poster Session |
- |
- |
| 12:00 - 13:30 |
Lunch break |
- |
- |
| 13:30 - 14:00 |
Lukas Schmid |
Understanding past, current, and future human motion |
View Abstract
Understanding human motion is essential for the next generation of embodied AI and human-centric robotics. However, in robotics, this understanding is characterized through several different phenomena. A robot needs to be able to capture both short-term dynamics, such as currently moving people, and long-term dynamics, such as changes in the scene imparted by human actions outside the view of the robot, in order to reconstruct past and predict future human actions and motion. This talk will highlight recent advances in the detection of both short-term and long-term dynamics, their unification for wholistic 4D robot perception, and the prediction of future actions and motions based on these observations. The presented methods are evaluated on-board mobile robots and available as open-source software.
|
| 14:00 - 14:30 |
Kashyap Chitta |
World Models: The Next Frontier of Motion Prediction |
View Abstract
Predicting how the multi-agent physical world evolves in response to our actions is a fundamental bottleneck in modern Physical AI, particularly in the context of simulation for training autonomous driving agents. While current systems have made strides by using advanced reconstruction techniques like 3D Gaussian Splatting, they remain constrained by limited spatial and temporal prediction horizons. This talk explores the paradigm shift toward generative world modeling for motion prediction, simulation, and control. We will trace the evolution from abstract behavioral simulations to today's state-of-the-art, pixel-level generative simulators, highlighting recent architectures that use diffusion transformers to synthesize long-horizon, action-conditioned sensor data. We then outline the next frontier: moving beyond using world models as "frozen" simulators. We discuss the emerging potential of latent world models that allow reinforcement learning to actively shape the world model during training, enabling applications beyond simulation such as predictive control.
|
| 14:30 - 15:00 |
Johannes Betz |
Overcoming Blind Spots: Occlusion Consideration for Improved Autonomous Driving Safety |
View Abstract
TBD
|
| 15:00 - 15:15 |
Coffee break |
- |
- |
| 15:15 - 15:45 |
Dave Woollard |
From Prediction to Priors: Stable Structure
in Human Motion for Robot Navigation |
View Abstract
Much of robot navigation in shared human environments is framed through trajectory prediction:
forecasting where people will move and planning accordingly. In this talk, I present a complementary
perspective: while individual human trajectories are noisy and difficult to predict exactly, large-scale
motion data reveals stable population-level structure that can also guide robot behavior.
Drawing on millions of real-world trajectories from operational retail environments, we show how
local motion patterns can serve as coordination priors for planning, biasing robots toward behaviors that
are more compatible with surrounding human flow. We also discuss implications for evaluation through
EvenFlow, a benchmark built from real human navigation scenarios.
More broadly, this suggests that human motion data has value not only for prediction, but also for
planning and evaluation through the stable behavioral structure it reveals.
|
| 15:45 - 16:15 |
Alessandro Corbetta |
Understanding Pedestrian Physics: Large-Scale Measurements, Physics-Based Modeling, and Generative AI |
View Abstract
Achieving a predictive understanding of pedestrian crowd dynamics is a major scientific challenge, with close connections to a wide range of fields, from statistical physics to social robotics. Over the past fifteen years, advances in automated vision have enabled increasingly precise measurements of crowd behavior across progressively larger spatial scales. Real-world data acquisition in public spaces, operating on a 24/7/365 basis, has led to datasets comprising millions of individual trajectories, capturing both typical patterns and rare events.
These developments have opened the door to large-scale, data-driven modeling approaches aimed at quantitatively predicting the statistical features of pedestrian dynamics, well beyond average behaviors.
In this talk, I will first review our work on large-scale crowd measurements. I will then address the challenge of developing models that achieve quantitative statistical accuracy across regimes, from dilute to dense crowds. In particular, I will discuss recent and ongoing approaches based on Langevin equations and variational principles, which successfully reproduce experimental ensemble statistics. I will conclude with an application of autoregressive AI methods to predict complex n-body dynamics and infer effective n-body interactions directly from large-scale trajectory data.
|
| 16:15 - 16:45 |
Organizers |
Closing and Best Poster Award |
- |