Multimodal Representation Learning with Deep Generative Models

Speaker

Shweta is currently a postdoctoral researcher in the Vision Group at the University of British Columbia. She obtained her Ph.D. under the supervision of Prof. Stefan Roth, Ph.D. at Technische Universität Darmstadt. Her Ph.D. thesis has been nominated for the Bertha Benz Best Thesis Award and the GI Dissertation Award 2023. During her Ph.D., she conducted research on deep generative algorithms for multimodal representation learning and the efficiency of exact inference deep generative models. She received her M.Sc. from Saarland University where she was a part of the Machine Learning Group and the Max Planck Institute of Informatics.

Her homepage is https://s-mahajan.github.io/

Abstract

In this talk, I will present my research on deep generative models that cater to the important aspects of multimodal learning of cross-domain data, images, and texts. We build upon the success of deep generative models like regularized autoencoders, generative adversarial networks, normalizing flows, and the very recent diffusion models, which have been successfully applied to image generation tasks to develop weakly supervised approaches for joint generative modeling of the image and text domains.

Video