Long video understanding with minimal supervision

Speaker

Tengda Han is a post-doctoral research fellow at the Visual Geometry Group at the University of Oxford. He obtained his PhD from the same group in 2022 supervised by Andrew Zisserman. His current research focuses on self-supervised learning, efficient learning, and video understanding.

Abstract

Understanding long videos is one of the pinnacles in computer vision. The long-time axis introduces extra challenges compared with images or short videos, and exhaustive manual annotation on long videos is infeasible. In this talk, I will introduce several works using minimal human annotations on two types of long videos: instructional videos like those from YouTube, and movies that span longer than one hour.

Video