Speaker
Jiafei Duan is currently a second-year PhD student at the Robotics and State Estimation Lab, University of Washington, under the advisory of Professor Dieter Fox and Ranjay Krishna. His research interest lies in intersection between robot learning, embodied AI and computer vision. Mr Duan served as the lead author in several embodied AI and robotics papers published in top-tier AI conferences and journals (including NeurIPS, IJCAI, EMNLP, ICLR, ICCV, ECCV, CoRL, and IEEE TETCI). He has also received Singapore’s prestigious National Science (PhD) Scholarship for his PhD studies.
Abstract
Training a generalist robotic foundational model relies on two key factors: the scalability of robot data collection and the generalization capabilities of modern behaviour cloning models (e.g., PerAct, ACT, RVT). To address these challenges, I first introduced AR2-D2, a novel system for collecting demonstrations that eliminates the need for specialized training or real robots during data collection, thus enabling the manipulation of diverse objects. Implemented as an iOS application, AR2-D2 allows users to record videos of themselves manipulating objects while simultaneously gathering essential data for training real robots. Our system proves effective in training behaviour cloning agents for real object manipulation, showing that training with our augmented reality (AR) data is as effective as training with real-world robot demonstrations. Additionally, I will present Colosseum, a new simulation benchmark featuring 20 diverse manipulation tasks that allow for the systematic evaluation of models across 12 axes of environmental perturbations. Colosseum facilitates the systematic assessment of behaviour cloning models’ robustness against these perturbations. Our results demonstrate a strong correlation between simulation outcomes and real-world experiment perturbations, affirming Colosseum’s ecological validity. Thus, I aim to address the two critical challenges in scaling robotic manipulation models.