Architectures and Training for Visual Understanding

06-10-2022

computer vision • vision transformer • deep learning

Speaker

Bio: Hugo Touvron is a research scientist at Meta AI Research. During his PhD he was advised by Hervé Jégou and Matthieu Cord. His current research interests include image classification, transfer learning & fine-grained recognition, with an emphasis on the interplay between architectures and training procedures.

Homepage: https://scholar.google.com/citations?user=xImarzoAAAAJ&hl=en

Abstract

Deep learning success is often associated with emblematic architectures. Almost everyone has heard of AlexNet, ResNet or GPT. These successes were also powered by well designed optimisation procedures, which are not usually central to the discussion. In image classification, the ImageNet challenge was an accelerator for the development of new architectures but also of novel optimisation recipes. For example, the GoogleNet paper proposes a new architecture but also substantial improvements to the training procedure. This was also the case with AlexNet. In this presentation, I will discuss the interactions between architectures and training procedures. I will more specifically talk about Transformers, for which training procedures are much less mature while being key to overcome the limited architectural priors. As a result, we present training procedures capable of obtaining interesting performance for Transformers or even simple Multi-Layer Perceptron.

Speaker

Abstract

Video