PAPER DIGEST
Most Influential ICML 2021 Paper · 2026-03 edition

Zero-Shot Text-to-Image Generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

Venue
International Conference on Machine Learning (ICML) 2021
Recognition
Most Influential ICML 2021 Paper (Rank No. 3)
Edition
2026-03
Impact factor
9
Certificate ID
f04eeb37996b5e35

Abstract

Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.

Download PDF certificate