PAPER DIGEST
Most Influential SIGMOD 2017 Paper · 2026-03 edition

Heterogeneity-aware Distributed Parameter Servers

Jiawei Jiang; Bin Cui; Ce Zhang; Lele Yu

Venue
ACM SIGMOD Conference (SIGMOD) 2017
Recognition
Most Influential SIGMOD 2017 Paper (Rank No. 7)
Edition
2026-03
Impact factor
5
Certificate ID
c360973f02e3244a

Abstract

We study distributed machine learning in <i>heterogeneous environments</i> in this work. We first conduct a systematic study of existing systems running distributed stochastic gradient descent; we find that, although these systems work well in homogeneous environments, they can suffer performance degradation, sometimes up to 10x, in heterogeneous environments where stragglers are common because their synchronization protocols cannot fit a heterogeneous setting. Our first contribution is a heterogeneity-aware algorithm that uses a constant learning rate schedule for updates before adding them to the global parameter. This allows us to suppress stragglers' harm on robust convergence. As a further improvement, our second contribution is a more sophisticated learning rate schedule that takes into consideration the delayed information of each update. We theoretically prove the valid convergence of both approaches and implement a prototype system in the production cluster of our industrial partner Tencent Inc. We validate the performance of this prototype using a range of machine-learning workloads. Our prototype is 2-12x faster than other state-of-the-art systems, such as Spark, Petuum, and TensorFlow; and our proposed algorithm takes up to 6x fewer iterations to converge.

Download PDF certificate