Video Creation with Diffusion Models

Speaker

Zhangjie Wu is a third-year Ph.D. student at Show Lab, National University of Singapore, working with Prof. Mike Zheng Shou. Prior to this, he obtained his Bachelor’s degree in Computer Science from Shen Yuan Honors College at Beihang University. His research focuses on AI for video understanding and generation. His representative works include Tune-A-Video, Show-1, and MotionDirector.

Abstract

Diffusion models have ushered in a new era of video content creation. In this talk, I will discuss our latest efforts in leveraging diffusion models for video generation and editing. I will first present Show-1, a hybrid model that combines pixel and latent-based diffusion models to produce videos with excellent text-video alignment and high visual fidelity. Additionally, we propose MotionDirector to adapt text-to-video diffusion models for video generation with customized motion. For video editing, our pioneering project, Tune-A-Video, utilizes stable diffusion techniques to edit short video clips. In our more recent effort, DynVideo-E, we employ dynamic Neural Radiance Fields (NeRF) as a video representation for editing longer videos, extending to several minutes. Our contributions significantly benefit both the research and open-source communities.

Video