PAPER DIGEST
Most Influential ICML 2012 Paper · 2026-03 edition

A Fast And Simple Algorithm For Training Neural Probabilistic Language Models

Andriy Mnih; Yee Whye Teh

Venue
International Conference on Machine Learning (ICML) 2012
Recognition
Most Influential ICML 2012 Paper (Rank No. 7)
Edition
2026-03
Impact factor
7
Certificate ID
5e1e9789b12a1e28

Abstract

Neural probabilistic language models (NPLMs) have recently superseded smoothed n-gram models as the best-performing model class for language modelling. Unfortunately, the adoption of NPLMs is held back by their notoriously long training times, which can be measured in weeks even for moderately-sized datasets. These are a consequence of the models being explicitly normalized, which leads to having to consider all words in the vocabulary when computing the log-likelihood gradients. We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions. We investigate the behaviour of the algorithm on the Penn Treebank corpus and show that it reduces the training times by more than an order of magnitude without affecting the quality of the resulting models. The algorithm is also more efficient and much more stable than importance sampling because it requires far fewer noise samples to perform well. We demonstrate the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary, obtaining state-of-the-art results in the Microsoft Research Sentence Completion Challenge.

Download PDF certificate