Calibrated Ensembles Can Mitigate Accuracy Tradeoffs Under Distribution Shift

Ananya Kumar; Tengyu Ma; Percy Liang; Aditi Raghunathan

Venue: Conference on Uncertainty in Artificial Intelligence (UAI) 2022
Recognition: Most Influential UAI 2022 Paper (Rank No. 9)
Edition: 2026-03
Impact factor: 3
Certificate ID: 746182bd9e579e06

Abstract

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy. A robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via vanilla ERM. In this paper, we find that a simple approach of ensembling the standard and robust models, after calibrating on only ID data, outperforms prior state-of-the-art both ID and OOD. On ten natural distribution shift datasets, ID-calibrated ensembles get the best of both worlds: strong ID accuracy of the standard model and OOD accuracy of the robust model. We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well on both ID and OOD: (1) standard and robust models should be calibrated (on ID data, because OOD data is unavailable), (2) OOD has no anticorrelated spurious features.

Download PDF certificate