Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models


Sheng Shen is a fourth-year Ph.D. student at UC Berkeley, advised by Prof. Kurt Keutzer and Prof. Trevor Darrell. His research interests focus on compute-optimal (multimodal) language modeling, including efficient training/tuning methods, model compression techniques, and the integration of vision-language models. He received the Lotfi A. Zadeh Prize in 2023. Prior to UCB, he obtained a B.S. degree in Electrical Engineering and Computer Science from Peking University.

Sheng’s homepage: https://sincerass.github.io/


Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnable parameters to Large Language Models (LLMs) without increasing inference cost. Instruction tuning is a technique for training LLMs to follow instructions. In this presentation, I will delve into our latest work, Flan-MoE, which amalgamates these techniques. We discovered that MoE models, when compared to their dense counterparts, exhibit enhanced performance through instruction tuning in areas such as direct fine-tuning and few or zero-shot generalization.


Coming soon. Stay tuned. :-)