Talks
- 08-12-2025 Scaling Beyond Autoregression
- 28-10-2025 Video models are zero-shot learners and reasoners
- 23-10-2025 The Tolman-Sherrington Metamorphosis of Intelligence
- 21-05-2025 Towards Packing the Intelligence of Large Foundation Models on Low Resource Devices
- 15-07-2024 Visual Human Motion Analysis
- 04-04-2024 Guardian of Trust in Language Models: Automatic Jailbreak and Systematic Defense
- 28-03-2024 Physics-informed Modeling of Dynamic Humans and their Interactions
- 28-03-2024 Towards democratising robot learning for all
- 21-03-2024 Learning Humanoid Robots
- 18-03-2024 LLaVA: A Vision-and-Language Approach to Computer Vision in the Wild
- 29-02-2024 Understanding and Mitigating the Pre-training Noise on Downstream Tasks
- 22-02-2024 Distilling Vision-Language Models on Millions of Videos
- 22-02-2024 Video Creation with Diffusion Models
- 19-02-2024 Towards Learning a Driving Simulator from the Real World
- 15-02-2024 InstantID: Zero-shot Identity-Preserving Generation in Seconds
- 21-12-2023 Long video understanding with minimal supervision
- 07-12-2023 3D Human Modelling from Image and Text Guidance
- 30-11-2023 Inductive Biases for Learning Long-Horizon Manipulation Skills
- 23-11-2023 Generalist Embodied AI in an Open World
- 16-11-2023 Learning visual language models for video understanding