Advancing Semi-Supervised Learning: Methods and Benchmarks

Speaker

Yidong Wang is a doctoral student at the National Engineering Research Center for Software Engineering of Peking University, advised by Prof. Wei Ye and Prof. Shikun Zhang. He received his master’s degree in the Department of Information and Communications Engineering of Tokyo Institute of Technology and his bachelor’s degree in the Department of Computer Science and Technology of Nanjing University. His research interests primarily focus on semi-supervised learning, transfer learning, and imbalanced learning. He has contributed to the machine learning community by open-sourcing popular Github repositories, including TorchSSL and USB.

His personal website: https://qianlanwyd.github.io/

Abstract

Semi-supervised learning has gained significant attention in machine learning due to its potential to leverage unlabeled data to enhance model performance. In this talk, I will discuss recent advancements in semi-supervised learning from both methodological and benchmarking perspectives. First, I will introduce several state-of-the-art methods that have significantly improved the performance of semi-supervised learning. Specifically, I will present Flexmatch, a curriculum learning-based method that improves the quality of pseudo labels for unlabeled data by adopting class-specific thresholds based on the model’s learning status. Additionally, I will introduce FreeMatch, a self-adaptive thresholding method that dynamically adjusts the threshold globally and locally to accept pseudo labels. FreeMatch eliminates the need for hand-tuning threshold values, leading to a more efficient and robust semi-supervised learning process. Finally, I will discuss SoftMatch, a framework that addresses the quantity-quality tradeoff in semi-supervised learning by deriving a truncated Gaussian function to weight samples based on their confidences. In addition to methods, I will introduce USB, a unified semi-supervised learning benchmark for classification that evaluates the performance of semi-supervised learning algorithms on various datasets from different domains. Unlike traditional benchmarks, USB utilizes pretrained transformers, enabling the evaluation of SSL algorithms on multiple tasks from multiple domains with less cost.

Video