PAPER DIGEST
Most Influential SIGMOD 2017 Paper · 2026-03 edition

Data Management Challenges In Production Machine Learning

Neoklis Polyzotis; Sudip Roy; Steven Euijong Whang; Martin Zinkevich

Venue
ACM SIGMOD Conference (SIGMOD) 2017
Recognition
Most Influential SIGMOD 2017 Paper (Rank No. 6)
Edition
2026-03
Impact factor
5
Certificate ID
2305989724a9021a

Abstract

The tutorial discusses data-management issues that arise in the context of machine learning pipelines deployed in production. Informed by our own experience with such largescale pipelines, we focus on issues related to understanding, validating, cleaning, and enriching training data. The goal of the tutorial is to bring forth these issues, draw connections to prior work in the database literature, and outline the open research questions that are not addressed by prior art.

Download PDF certificate