Unsolved ML Safety Problems


Dan Hendrycks is the director of the Center for AI Safety. He received his PhD from UC Berkeley, where he was advised by Jacob Steinhardt and Dawn Song. His research is supported by the NSF GRFP and the Open Philanthropy AI Fellowship. Dan contributed the GELU activation function, the default activation in nearly all state-of-the-art ML models including BERT, Vision Transformers, and GPT-3. Dan also contributed the main baseline for OOD detection and benchmarks for robustness (ImageNet-C) and large language models (MMLU, MATH). For more information, visit his website https://danhendrycks.com


Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, I outline a roadmap for ML Safety and refine the technical problems that the field needs to address. I present three pillars of ML safety, namely withstanding hazards (“Robustness”), identifying hazards (“Monitoring”), and steering ML systems (“Alignment”).

Related reading: https://arxiv.org/abs/2109.13916 Related course: https://course.mlsafety.org


Coming soon. Stay tuned. :-)