Understanding and Mitigating the Pre-training Noise on Downstream Tasks

29-02-2024

Speaker

Hao Chen is a 3rd Ph.D. candidate at Carnegie Mellon University. He is advised by Prof. Bhiksha Raj and collaborates with Dr. Jindong Wang. His current research interest, in general, lies in learning with weak supervision and understanding the robustness and generalization of large foundation models. Previously, he was mainly working on semi-supervised learning and parameter-efficient transfer learning.

Abstract

Recent advancements in large foundation models have showcased their remarkable ability to generalize across various tasks. These models are typically pre-trained on extremely large-scale datasets before being fine-tuned for specific downstream applications. A critical aspect, often underexplored, is the effect of noise in pre-training datasets on model generalization. We aim to understand the nature of noise in the context of large-scale pre-training. Interestingly, we found that slight noise can benefit in-domain (ID) performance, it consistently hampers out-of-domain (OOD) performance. To mitigate these detrimental effects, we introduce a regularization method and demonstrate its effectiveness on both vision and language models (including the API model).

Speaker

Abstract

Video