Paper Digest: SIGMOD 2020 Highlights

June 16, 2020August 18, 2020 admin

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The ACM Special Interest Group on Management of Data (SIGMOD) is one of the top conferences on database management systems and data management technology. In 2020, it is to be held virtually due to covid-19 pandemic.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: SIGMOD 2020 Papers, Keynotes, Tutorials, Demos, and Student Abstracts

	Title	Authors	Highlight
1	Systems and ML: When the Sum is Greater than Its Parts	Ion Stoica	Systems and ML: When the Sum is Greater than Its Parts
2	Recommending Deployment Strategies for Collaborative Tasks	Dong Wei, Senjuti Basu Roy, Sihem Amer-Yahia	We propose StratRec, an optimization-driven middle layer that recommends deployment strategies and alternative deployment parameters to requesters by accounting for worker availability.
3	Human-in-the-loop Outlier Detection	Chengliang Chai, Lei Cao, Guoliang Li, Jian Li, Yuyu Luo, Samuel Madden	In this work, we propose a human-in-the-loop outlier detection approach HOD that effectively leverages human intelligence to discover the true outliers.
4	QUAD: Quadratic-Bound-based Kernel Density Visualization	Tsz Nam Chan, Reynold Cheng, Man Lung Yiu	Our goal is to improve the performance of KDV, in order to support large datasets (e.g., one million points) and high screen resolutions (e.g., 1280 x 960 pixels).
5	ShapeSearch: A Flexible and Efficient System for Shape-based Exploration of Trendlines	Tarique Siddiqui, Paul Luh, Zesheng Wang, Karrie Karahalios, Aditya Parameswaran	We propose ShapeSearch, an efficient and flexible pattern-searching tool, that enables the search for desired patterns via multiple mechanisms: sketch, natural-language, and visual regular expressions.
6	Marviq: Quality-Aware Geospatial Visualization of Range-Selection Queries Using Materialization	Liming Dong, Qiushi Bai, Taewoo Kim, Taiji Chen, Weidong Liu, Chen Li	We present a novel middleware-based technique called Marviq.
7	Transactional Causal Consistency for Serverless Computing	Chenggang Wu, Vikram Sreekanti, Joseph M. Hellerstein	We present protocols for MTCC implemented in a system called HYDROCACHE.
8	Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings	Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao Le	In this work, we investigate two key questions: (i) can we learn accurate cost models for big data systems, and (ii) can we integrate the learned models within the query optimizer.
9	Lambada: Interactive Data Analytics on Cold Data Using Serverless Cloud Infrastructure	Ingo Müller, Renato Marroquín, Gustavo Alonso	In this paper, we present Lambada, a serverless distributed data processing framework designed to explore how to perform data analytics on serverless computing.
10	Starling: A Scalable Query Engine on Cloud Functions	Matthew Perron, Raul Castro Fernandez, David DeWitt, Samuel Madden	In this paper we present Starling, a query execution engine built on cloud function services that employs a number of techniques to mitigate these challenges, providing interactive query latency at a lower total cost than provisioned systems with low-to-moderate utilization.
11	Learning a Partitioning Advisor for Cloud Databases	Benjamin Hilprecht, Carsten Binnig, Uwe Röhm	In this paper, we introduce a new learned partitioning advisor based on Deep Reinforcement Learning (DRL) for OLAP-style workloads.
12	DB4ML – An In-Memory Database Kernel with Machine Learning Support	Matthias Jasny, Tobias Ziegler, Tim Kraska, Uwe Roehm, Carsten Binnig	In this paper, we revisit the question of how ML algorithms can be best integrated into existing DBMSs to not only avoid expensive data copies to external ML tools but also to comply with regulatory reasons.
13	Active Learning for ML Enhanced Database Systems	Lin Ma, Bailu Ding, Sudipto Das, Adith Swaminathan	In this paper, we address this performance degradation by using B-instances to collect additional data during deployment.
14	Qd-tree: Learning Data Layouts for Big Data Analytics	Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, Rajeev Acharya	In this paper, we propose a new framework called a query-data routing tree, or qd-tree, to address this problem, and propose two algorithms for their construction based on greedy and deep reinforcement learning techniques.
15	Facilitating SQL Query Composition and Analysis	Zainab Zolaktaf, Mostafa Milani, Rachel Pottinger	We examine methods that can accelerate and improve this interaction by providing insights about SQL queries prior to execution.
16	MONSOON: Multi-Step Optimization and Execution of Queries with Partially Obscured Predicates	Sourav Sikdar, Chris Jermaine	In this paper, we describe a query optimizer called the Monsoon optimizer.
17	Causal Relational Learning	Babak Salimi, Harsh Parikh, Moe Kayali, Lise Getoor, Sudeepa Roy, Dan Suciu	In this paper, we present a formal framework for causal inference from such relational data.
18	Sample Debiasing in the Themis Open World Database System	Laurel Orr, Magdalena Balazinska, Dan Suciu	We present Themis, the first open world database that automatically rebalances arbitrarily biased samples to approximately answer queries as if they were issued over the entire population.
19	Stochastic Package Queries in Probabilistic Databases	Matteo Brucato, Nishant Yadav, Azza Abouzied, Peter J. Haas, Alexandra Meliou	We provide methods for in-database support of decision making under uncertainty.
20	Fast and Reliable Missing Data Contingency Analysis with Predicate-Constraints	Xi Liang, Zechao Shang, Sanjay Krishnan, Aaron J. Elmore, Michael J. Franklin	We propose a framework that can produce automatic contingency analysis, i.e., the range of values an aggregate SQL query could take, under formal constraints describing the variation and frequency of missing data tuples.
21	Mining Approximate Acyclic Schemes from Relations	Batya Kenig, Pranay Mundra, Guna Prasaad, Babak Salimi, Dan Suciu	In this paper we present Maimon, a system for discovering approximate acyclic schemes and MVDs from data.
22	AliCoCo: Alibaba E-commerce Cognitive Concept Net	Xusheng Luo, Luxin Liu, Yonghua Yang, Le Bo, Yuanpeng Cao, Jinghang Wu, Qiang Li, Keping Yang, Kenny Q. Zhu	In this paper, we propose to construct a large-scale E-commerce Cognitive Concept Net named "AliCoCo", which is practiced in Alibaba, the largest Chinese e-commerce platform in the world.
23	A1: A Distributed In-Memory Graph Database	Chiranjeeb Buragohain, Knut Magne Risvik, Paul Brett, Miguel Castro, Wonhee Cho, Joshua Cowhig, Nikolas Gloy, Karthik Kalyanaraman, Richendra Khanna, John Pao, Matthew Renzelmann, Alex Shamis, Timothy Tan, Shuheng Zheng	In this paper we describe the A1 data model, RDMA optimized data structures and query execution.
24	IBM Db2 Graph: Supporting Synergistic and Retrofittable Graph Queries Inside IBM Db2	Yuanyuan Tian, En Liang Xu, Wei Zhao, Mir Hamid Pirahesh, Sui Jun Tong, Wen Sun, Thomas Kolanko, Md. Shahidul Haque Apu, Huijuan Peng	In this paper, we propose an in-DBMS graph query approach, IBM Db2 Graph, to support synergistic and retrofittable graph queries inside the IBM Db2 relational database.
25	An Ontology-Based Conversation System for Knowledge Bases	Abdul Quamar, Chuan Lei, Dorian Miller, Fatma Ozcan, Jeffrey Kreulen, Robert J. Moore, Vasilis Efthymiou	In this paper, we propose an ontology-based conversation system for domain-specific KBs.
26	Aggregation Support for Modern Graph Analytics in TigerGraph	Alin Deutsch, Yu Xu, Mingxi Wu, Victor E. Lee	We describe how GSQL, TigerGraph’s graph query language, supports the specification of aggregation in graph analytics.
27	GIANT: Scalable Creation of a Web-scale Ontology	Bang Liu, Weidong Guo, Di Niu, Jinwen Luo, Chaoyue Wang, Zhen Wen, Yu Xu	In this paper, we present GIANT, a mechanism to construct a user-centered, web-scale, structured ontology, containing a large number of natural language phrases conforming to user attentions at various granularities, mined from the vast volume of web documents and search click logs.
28	The Next 5 Years: What Opportunities Should the Database Community Seize to Maximize its Impact?	Magda Balazinska, Surajit Chaudhuri, Anastasia Ailamaki, Juliana Freire, Sailesh Krishnamurthy, Michael Stonebraker	The Next 5 Years: What Opportunities Should the Database Community Seize to Maximize its Impact?
29	Equivalence-Invariant Algebraic Provenance for Hyperplane Update Queries	Pierre Bourhis, Daniel Deutch, Yuval Moskovitch	In this paper we present the first (to our knowledge) algebraic provenance model, for a fragment of update queries, that is invariant under set equivalence.
30	Causality-Guided Adaptive Interventional Debugging	Anna Fariha, Suman Nath, Alexandra Meliou	We propose Adaptive Interventional Debugging (AID) for debugging such intermittent failures.
31	PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models	Yinjun Wu, Val Tannen, Susan B. Davidson	This paper presents an efficient provenance-based approach, PrIU, and its optimized version, PrIU-opt, for incrementally updating model parameters without sacrificing prediction accuracy.
32	BugDoc: Algorithms to Debug Computational Processes	Raoni Lourenço, Juliana Freire, Dennis Shasha	We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures.
33	Computing Local Sensitivities of Counting Queries with Joins	Yuchao Tao, Xi He, Ashwin Machanavajjhala, Sudeepa Roy	In this paper, we present a novel approach to compute local sensitivity of counting queries involving join operations by tracking and summarizing tuple sensitivities.
34	Long-lived Transactions Made Less Harmful	Jongbin Kim, Hyunsoo Cho, Kihwang Kim, Jaeseon Yu, Sooyong Kang, Hyungsoo Jung	In this paper, we formalize such rules into our version pruning theorem and version classification, of which all form theoretical foundations for our new version management system, vDriver, that bases its record versioning on a new principle: Single In-row Remaining Off-row (SIRO) versioning.
35	Chiller: Contention-centric Transaction Execution and Data Partitioning for Modern Networks	Erfan Zamanian, Julian Shun, Carsten Binnig, Tim Kraska	In this paper, we first make the case that the new bottleneck which hinders truly scalable transaction processing in modern RDMA-enabled databases is data contention, and that optimizing for data contention leads to different partitioning layouts than optimizing for the number of distributed transactions.
36	Handling Highly Contended OLTP Workloads Using Fast Dynamic Partitioning	Guna Prasaad, Alvin Cheung, Dan Suciu	Towards addressing this, we propose Strife—a novel transaction processing scheme that clusters transactions together dynamically and executes most of them without any concurrency control.
37	A Transactional Perspective on Execute-order-validate Blockchains	Pingcheng Ruan, Dumitrel Loghin, Quang-Trung Ta, Meihui Zhang, Gang Chen, Beng Chin Ooi	Inspired by optimistic concurrency control in modern databases, we propose a novel method to enhance the execute-order-validate architecture, by reordering transactions to reduce the abort rate.
38	Aggify: Lifting the Curse of Cursor Loops using Custom Aggregates	Surabhi Gupta, Sanket Purandare, Karthik Ramachandra	We present Aggify, a technique for optimizing loops over query results that overcomes these overheads.
39	Querying Shared Data with Security Heterogeneity	Yang Cao, Wenfei Fan, Yanghao Wang, Ke Yi	We formalize query answering as a bi-criteria optimization problem, to minimize both data sharing toll and parallel query evaluation cost. Despite the hardness, we develop a set of approximate algorithms to generate distributed query plans that minimize data sharing toll and reduce parallel evaluation cost.
40	SAGMA: Secure Aggregation Grouped by Multiple Attributes	Timon Hackenjos, Florian Hahn, Florian Kerschbaum	In this work we present SAGMA — an encryption scheme for performing secure aggregation grouped by multiple attributes.
41	Crypt?: Crypto-Assisted Differential Privacy on Untrusted Servers	Amrita Roy Chowdhury, Chenghong Wang, Xi He, Ashwin Machanavajjhala, Somesh Jha	In this work, we propose, Crypt?, a system and programming framework that (1) achieves the accuracy guarantees and algorithmic expressibility of the central model (2) without any trusted data collector like in the local model.
42	Estimating Numerical Distributions under Local Differential Privacy	Zitao Li, Tianhao Wang, Milan Lopuhaä-Zwakenberg, Ninghui Li, Boris Škoric	We introduce a new reporting mechanism, called the square wave (SW) mechanism, which exploits the numerical nature in reporting.
43	FalconDB: Blockchain-based Collaborative Database	Yanqing Peng, Min Du, Feifei Li, Raymond Cheng, Dawn Song	In this paper, we present FalconDB, which enables different parties with limited hardware resources to efficiently and securely collaborate on a database.
44	Exact Single-Source SimRank Computation on Large Graphs	Hanzhi Wang, Zhewei Wei, Ye Yuan, Xiaoyong Du, Ji-Rong Wen	In this paper, we present ExSim, the first algorithm that computes the exact single-source and top-k SimRank results on large graphs.
45	Distributed Processing of k Shortest Path Queries over Dynamic Road Networks	Ziqiang Yu, Xiaohui Yu, Nick Koudas, Yang Liu, Yifan Li, Yueting Chen, Dingyu Yang	We therefore propose KSP-DG, a distributed algorithm for identifying k-shortest paths in a dynamic graph.
46	On the Optimization of Recursive Relational Queries: Application to Graph Queries	Louis Jachiet, Pierre Genevès, Nils Gesbert, Nabil Layaida	We propose mu-RA, a variation of the Relational Algebra equipped with a fixpoint operator for expressing recursive relational queries.
47	Pensieve: Skewness-Aware Version Switching for Efficient Graph Processing	Tangwei Ying, Hanhua Chen, Hai Jin	In this work, we observe: 1) high degree vertices incur much more significant storage overheads during graph version evolving compared to low degree vertices; 2) the skewed access frequency among graph versions greatly influences the system performance for version reproducing.
48	Extending Graph Patterns with Conditions	Grace Fan, Wenfei Fan, Yuanhao Li, Ping Lu, Chao Tian, Jingren Zhou	We propose an extension of graph patterns, referred to as conditional graph patterns and denoted as CGPs.
49	Elastic Machine Learning Algorithms in Amazon SageMaker	Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, David Arpin, Tim Januschowski, Valentin Flunkert, Yuyang Wang, Jan Gasthaus, Lorenzo Stella, Syama Rangapuram, David Salinas, Sebastian Schelter, Alex Smola	We discuss such challenges and derive requirements for an industrial-scale ML platform. Next, we describe the computational model behind Amazon SageMaker, which is designed to meet such challenges
50	Timon: A Timestamped Event Database for Efficient Telemetry Data Processing and Analytics	Wei Cao, Yusong Gao, Feifei Li, Sheng Wang, Bingchen Lin, Ke Xu, Xiaojie Feng, Yucong Wang, Zhenjun Liu, Gejin Zhang	Timon is a timestamped event database that aims to support aggregations and handle late arrivals both correctly (i.e., upholding the exactly-once semantics) and efficiently.
51	Vertica-ML: Distributed Machine Learning in Vertica Database	Arash Fard, Anh Le, George Larionov, Waqas Dhillon, Chuck Bear	In this paper, we present our distributed machine learning subsystem within the Vertica database. We explain the architecture of the subsystem, and present a set of experiments to evaluate the performance of the machine learning algorithms implemented on top of it.
52	Database Workload Capacity Planning using Time Series Analysis and Machine Learning	Antony S. Higginson, Mihaela Dediu, Octavian Arsene, Norman W. Paton, Suzanne M. Embury	In this paper we look at the forecasting techniques in use today and evaluate if those techniques are applicable to the deeper layers of the technological stack such as clustered database instances, applications and groups of transactions that make up the database workload.
53	The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development	Micah J. Smith, Carles Sala, James Max Kanter, Kalyan Veeramachaneni	To address these problems, we introduce the Machine Learning Bazaar, a new framework for developing machine learning and automated machine learning software systems.
54	When the Web is your Data Lake: Creating a Search Engine for Datasets on the Web	Natasha Noy	In this talk, I will discuss our work on Dataset Search, which provides search capabilities over potentially all dataset repositories on the Web.
55	The Challenge of Building Effective, Enterprise-scale Data Lakes	Awez Syed	In this talk, we describe the real-world implementation patterns of data lakes and give an overview of the many open challenges in deploying successful, enterprise-scale data lakes.
56	Cleaning Denial Constraint Violations through Relaxation	Stella Giannakopoulou, Manos Karpathiotakis, Anastasia Ailamaki	We propose an approach that performs probabilistic repair of denial constraint violations on-demand, driven by the exploratory analysis that users perform.
57	On Multiple Semantics for Declarative Database Repairs	Amir Gilad, Daniel Deutch, Sudeepa Roy	We show that there are no one-size-fits-all semantics for repairs in this inclusive setting, and we consequently introduce multiple alternative semantics, presenting the case for using each of them.
58	Discovery Algorithms for Embedded Functional Dependencies	Ziheng Wei, Sven Hartmann, Sebastian Link	We show that the discovery problem of eFDs is NP-complete, W[2]-complete in the output, and has a minimum solution space that is larger than the maximum solution space for functional dependencies.
59	SCODED: Statistical Constraint Oriented Data Error Detection	Jing Nathan Yan, Oliver Schulte, MoHan Zhang, Jiannan Wang, Reynold Cheng	We develop SCODED, an SC-Oriented Data Error Detection system, comprising two key components: (1) SC Violation Detection : checks whether an SC is violated on a given dataset, and (2) Error Drill Down : identifies the top-k records that contribute most to the violation of an SC.
60	A Statistical Perspective on Discovering Functional Dependencies in Noisy Data	Yunjia Zhang, Zhihan Guo, Theodoros Rekatsinas	We study the problem of discovering functional dependencies (FD) from a noisy data set.
61	Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines	Michael Haubenschild, Caetano Sauer, Thomas Neumann, Viktor Leis	In this work, we propose a new logging and recovery design that supports incremental and fuzzy checkpointing, index recovery, out-of-memory workloads, and low-latency transaction commits.
62	Lethe: A Tunable Delete-Aware LSM Engine	Subhadeep Sarkar, Tarikul Islam Papon, Dimitris Staratzis, Manos Athanassoulis	To address these challenges, in this paper, we build a new key-value storage engine, Lethe, that uses a very small amount of additional metadata, a set of new delete-aware compaction policies, and a new physical data layout that weaves the sort and the delete key order.
63	BinDex: A Two-Layered Index for Fast and Robust Scans	Linwei Li, Kai Zhang, Jiading Guo, Wen He, Zhenying He, Yinan Jing, Weili Han, X. Sean Wang	In order to obtain fast and robust scans under all selectivities, this paper proposes BinDex, a two-layered index structure based on binned bitmaps that can be used to significantly accelerate the scan operations for in-memory column stores.
64	Analysis of Indexing Structures for Immutable Data	Cong Yue, Zhongle Xie, Meihui Zhang, Gang Chen, Beng Chin Ooi, Sheng Wang, Xiaokui Xiao	To alleviate the above problem, we present a comprehensive analysis of the existing index structures for immutable data, and evaluate both their asymptotic and empirical performance.
65	Tree-Encoded Bitmaps	Harald Lang, Alexander Beischl, Viktor Leis, Peter Boncz, Thomas Neumann, Alfons Kemper	We propose a novel method to represent compressed bitmaps.
66	ALEX: An Updatable Adaptive Learned Index	Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, Tim Kraska	In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes.
67	Learning Multi-Dimensional Indexes	Vikram Nathan, Jialin Ding, Mohammad Alizadeh, Tim Kraska	In this paper, we introduce Flood, a multi-dimensional in-memory read-optimized index that automatically adapts itself to a particular dataset and workload by jointly optimizing the index structure and data storage layout.
68	The Case for a Learned Sorting Algorithm	Ani Kristo, Kapil Vaidya, Ugur Çetintemel, Sanchit Misra, Tim Kraska	In this work, we introduce a new type of distribution sort that leverages a learned model of the empirical CDF of the data.
69	QuickSel: Quick Selectivity Learning with Mixture Models	Yongjoo Park, Shucheng Zhong, Barzan Mozafari	In this paper, we propose a selectivity learning framework, called QuickSel, which falls into the query-driven paradigm but does not use histograms.
70	Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries	Shohedul Hasan, Saravanan Thirumuruganathan, Jees Augustine, Nick Koudas, Gautam Das	In this paper, we propose two complementary approaches that are effective for this scenario.
71	Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs	Chenhao Ma, Yixiang Fang, Reynold Cheng, Laks V.S. Lakshmanan, Wenjie Zhang, Xuemin Lin	In this paper, we develop an efficient and scalable DDS solution.
72	GPU-Accelerated Subgraph Enumeration on Partitioned Graphs	Wentian Guo, Yuchen Li, Mo Sha, Bingsheng He, Xiaokui Xiao, Kian-Lee Tan	In this paper, we propose a new approach for GPU-accelerated subgraph enumeration that can efficiently scale to large graphs beyond the GPU memory.
73	In-Memory Subgraph Matching: An In-depth Study	Shixuan Sun, Qiong Luo	We study the performance of eight representative in-memory subgraph matching algorithms.
74	G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching	Yeonsu Park, Seongyun Ko, Sourav S. Bhowmick, Kyoungmin Kim, Kijae Hong, Wook-Shin Han	In this paper, for the first time, we present a comprehensive study of the existing cardinality estimation techniques for subgraph matching queries, scaling far beyond the original experiments.
75	Approximate Pattern Matching in Massive Graphs with Precision and Recall Guarantees	Tashin Reza, Matei Ripeanu, Geoffrey Sanders, Roger Pearce	We present a new algorithmic pipeline for approximate matching that combines edit-distance based matching with systematic graph pruning.
76	A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching	Venkata Vamsikrishna Meduri, Lucian Popa, Prithviraj Sen, Mohamed Sarwat	In this paper, we build a unified active learning benchmark framework for EM that allows users to easily combine different learning algorithms with applicable example selection algorithms.
77	ZeroER: Entity Resolution using Zero Labeled Examples	Renzhi Wu, Sanya Chaba, Saurabh Sawlani, Xu Chu, Saravanan Thirumuruganathan	We investigate an important problem that vexes practitioners: is it possible to design an effective algorithm for ER that requires Zero labeled examples, yet can achieve performance comparable to supervised approaches?
78	Towards Interpretable and Learnable Risk Analysis for Entity Resolution	Zhaoqiang Chen, Qun Chen, Boyi Hou, Zhanhuai Li, Guoliang Li	In this paper, we propose an interpretable and learnable framework for risk analysis, which aims to rank the labeled pairs based on their risks of being mislabeled.
79	SLIM: Scalable Linkage of Mobility Data	Fuat Basïk, Hakan Ferhatosmano?lu, Bu?ra Gedik	We present a scalable solution to link entities across mobility datasets using their spatio-temporal information.
80	Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach	Yaoshu Wang, Chuan Xiao, Jianbin Qin, Xin Cao, Yifang Sun, Wei Wang, Makoto Onizuka	In this paper, we investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection.
81	Fast Join Project Query Evaluation using Matrix Multiplication	Shaleen Deep, Xiao Hu, Paraschos Koutris	In this paper, we study how a class of join queries with projections can be evaluated faster using worst-case optimal algorithms together with matrix multiplication.
82	Maintaining Acyclic Foreign-Key Joins under Updates	Qichen Wang, Ke Yi	In this paper, we study the problem of incrementally maintaining the query results of these joins under updates, i.e., insertion and deletion of tuples to any of the relations.
83	Thrifty Query Execution via Incrementability	Dixin Tang, Zechao Shang, Aaron J. Elmore, Sanjay Krishnan, Michael J. Franklin	In this paper, we propose a new metric incrementability to quantify the cost-effectiveness of IVM to decide how eagerly or lazily databases should incrementally execute a query.
84	A Method for Optimizing Opaque Filter Queries	Wenjia He, Michael R. Anderson, Maxwell Strome, Michael Cafarella	We propose voodoo indexing, a two-phase method for optimizing opaque filter queries.
85	Functional-Style SQL UDFs With a Capital ‘F’	Christian Duta, Torsten Grust	This paper describes how to compile such functional-style UDFs into SQL:1999 recursive common table expressions.
86	Learning to Validate the Predictions of Black Box Classifiers on Unseen Data	Sebastian Schelter, Tammo Rukat, Felix Biessmann	We propose a simple approach to automate the validation of deployed ML models by estimating the model’s predictive performance on unseen, unlabeled serving data.
87	Learning Over Dirty Data Without Cleaning	Jose Picado, John Davis, Arash Termehchy, Ga Young Lee	We propose Dirty Learn, DLearn, a novel learning system that learns directly over dirty databases effectively and efficiently without any preprocessing.
88	Complaint-driven Training Data Debugging for Query 2.0	Weiyuan Wu, Lampros Flokas, Eugene Wu, Jiannan Wang	We propose two novel heuristic approaches based on influence functions which both require linear retraining steps.
89	Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks	Riccardo Cappuzzo, Paolo Papotti, Saravanan Thirumuruganathan	We propose algorithms for obtaining local embeddings that are effective for data integration tasks on relational databases.
90	Minimization of Classifier Construction Cost for Search Queries	Shay Gershtein, Tova Milo, Gefen Morami, Slava Novgorodov	The goal of our research is to devise effective algorithms to choose which classifiers one should train to address a given query load while minimizing the cost.
91	Scaling Up Distance Labeling on Graphs with Core-Periphery Properties	Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, Xuemin Lin	To scale up distance labeling, this paper proposes Core-Tree (CT) Index to facilitate a critical and effective trade-off between the index size and query time.
92	Factorized Graph Representations for Semi-Supervised Learning from Sparse Data	Krishna Kumar P., Paul Langton, Wolfgang Gatterbauer	We instead suggest a principled and scalable method for directly estimating the compatibilities from a sparsely labeled graph.
93	Reliable Data Distillation on Graph Convolutional Network	Wentao Zhang, Xupeng Miao, Yingxia Shao, Jiawei Jiang, Lei Chen, Olivier Ruas, Bin Cui	Therefore, we propose Reliable Data Distillation, a reliable data driven semi-supervised GCN training method.
94	Regular Path Query Evaluation on Streaming Graphs	Anil Pacaci, Angela Bonifati, M. Tamer Özsu	We propose deterministic algorithms to efficiently evaluate persistent RPQs under both arbitrary and simple path semantics in a uniform manner.
95	Timely Reporting of Heavy Hitters using External Memory	Prashant Pandey, Shikha Singh, Michael A. Bender, Jonathan W. Berry, Martín Farach-Colton, Rob Johnson, Thomas M. Kroeger, Cynthia A. Phillips	We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = ? N-th occurrence (and hence becomes a heavy hitter).
96	A Framework for Emulating Database Operations in Cloud Data Warehouses	Mohamed A. Soliman, Lyublena Antova, Marc Sugiyama, Michael Duller, Amirhossein Aleyasen, Gourab Mitra, Ehab Abdelhamid, Mark Morcos, Michele Gage, Dmitri Korablev, Florian M. Waas	In this paper we build on our earlier work in adaptive data virtualization and present novel techniques that allow running applications utilizing sophisticated database features within foreign query engines lacking the native support of such features.
97	Taurus Database: How to be Fast, Available, and Frugal in the Cloud	Alex Depoutovitch, Chong Chen, Jin Chen, Paul Larson, Shu Lin, Jack Ng, Wenlin Cui, Qiang Liu, Wei Huang, Yong Xiao, Yongjun He	In this paper, we describe the design of Taurus, a new multi-tenant cloud database system.
98	Reliability Analytics for Cloud Based Distributed Databases	Mathieu B. Demarne, Jim Gramling, Tomer Verona, Miso Cilimdzic	We present RADD, an innovative analytic pipeline used to measure reliability and availability for cloud-based distributed databases by leveraging the vast amount of telemetry present in the cloud.
99	CockroachDB: The Resilient Geo-Distributed SQL Database	Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, Paul Bardea, Amruta Ranade, Ben Darnell, Bram Gruneir, Justin Jaffray, Lucy Zhang, Peter Mattis	This paper presents the design of CockroachDB and its novel transaction model that supports consistent geo-distributed transactions on commodity hardware.
100	Azure SQL Database Always Encrypted	Panagiotis Antonopoulos, Arvind Arasu, Kunal D. Singh, Ken Eguro, Nitish Gupta, Rajat Jain, Raghav Kaushik, Hanuma Kodavalla, Donald Kossmann, Nikolas Ogg, Ravi Ramamurthy, Jakub Szymaszek, Jeffrey Trimmer, Kapil Vaswani, Ramarathnam Venkatesan, Mike Zwilling	This paper presents Always Encrypted, a recently released feature of Microsoft SQL Server that uses column granularity encryption to provide cryptographic data protection guarantees.
101	Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning	Ori Bar El, Tova Milo, Amit Somech	To address this, we present ATENA, a system that takes an input dataset and auto-generates a compelling exploratory session, presented in an EDA notebook.
102	Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks	Cong Yan, Yeye He	We propose a novel approach to "auto-suggest" contextualized data preparation steps, by "learning" from how data scientists would manipulate data, which are documented by data science notebooks widely available today.
103	IDEBench: A Benchmark for Interactive Data Exploration	Philipp Eichmann, Emanuel Zgraggen, Carsten Binnig, Tim Kraska	In this paper we argue that this is due to the fact that the workloads and metrics of popular analytical benchmarks such as TPC-H or TPC-DS were designed for traditional performance reporting scenarios, and do not capture distinctive IDE characteristics.
104	Database Benchmarking for Supporting Real-Time Interactive Querying of Large Data	Leilani Battle, Philipp Eichmann, Marco Angelini, Tiziana Catarci, Giuseppe Santucci, Yukun Zheng, Carsten Binnig, Jean-Daniel Fekete, Dominik Moritz	In this paper, we present a new benchmark to validate the suitability of database systems for interactive visualization workloads.
105	Benchmarking Spreadsheet Systems	Sajjadur Rahman, Kelly Mack, Mangesh Bendre, Ruilin Zhang, Karrie Karahalios, Aditya Parameswaran	We present a benchmarking study that evaluates and compares the performance of three popular systems, Microsoft Excel, LibreOffice Calc, and Google Sheets, on a range of canonical spreadsheet computation operations.
106	Order-Preserving Key Compression for In-Memory Search Trees	Huanchen Zhang, Xiaoxuan Liu, David G. Andersen, Michael Kaminsky, Kimberly Keeton, Andrew Pavlo	We present the High-speed Order-Preserving Encoder (HOPE) for in-memory search trees.
107	A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics	Anil Shanbhag, Samuel Madden, Xiangyao Yu	In this paper, we adopt a model-based approach to understand when and why the performance gains of running queries on GPUs vs on CPUs vary from the bandwidth ratio (which is roughly 16× on modern hardware).
108	Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects	Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, Volker Markl	In this paper, we investigate how a fast interconnect can resolve these scalability limitations using the example of NVLink 2.0.
109	Robust Performance of Main Memory Data Structures by Configuration	Tiemo Bang, Ismail Oukid, Norman May, Ilia Petrov, Carsten Binnig	In this paper, we present a new approach for achieving robust performance of data structures making it easier to reuse the same design for different hardware generations but also for different workloads.
110	Black or White? How to Develop an AutoTuner for Memory-based Analytics	Mayuresh Kunjir, Shivnath Babu	We study the problem of autotuning the memory allocation for applications running on modern distributed data processing systems.
111	Vista: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale	Supun Nakandala, Arun Kumar	We present Vista, a new data system that resolves these issues by elevating this workload to a declarative level on top of dataflow and deep learning systems.
112	Optimizing Machine Learning Workloads in Collaborative Environments	Behrouz Derakhshan, Alireza Rezaei Mahdiraji, Ziawasch Abedjan, Tilmann Rabl, Volker Markl	To address this issue, we propose two algorithms for materializing artifacts based on their likelihood of future reuse.
113	GOGGLES: Automatic Image Labeling with Affinity Coding	Nilaksh Das, Sanya Chaba, Renzhi Wu, Sakshi Gandhi, Duen Horng Chau, Xu Chu	We build the GOGGLES system that implements affinity coding for labeling image datasets by designing a novel set of reusable affinity functions for images, and propose a novel hierarchical generative model for class inference using a small development set.
114	DeepSqueeze: Deep Semantic Compression for Tabular Data	Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, Ugur Cetintemel	We propose DeepSqueeze, a novel semantic compression framework that can efficiently capture these complex relationships within tabular data by using autoencoders to map tuples to a lower-dimensional representation.
115	TRACER: A Framework for Facilitating Accurate and Interpretable Analytics for High Stakes Applications	Kaiping Zheng, Shaofeng Cai, Horng Ruey Chua, Wei Wang, Kee Yuan Ngiam, Beng Chin Ooi	In this paper, we propose a general framework TRACER to facilitate accurate and interpretable predictions, with a novel model TITV devised for healthcare analytics and other high stakes applications such as financial investment and risk management.
116	Application Driven Graph Partitioning	Wenfei Fan, Ruochun Jin, Muyang Liu, Ping Lu, Xiaojian Luo, Ruiqi Xu, Qiang Yin, Wenyuan Yu, Jingren Zhou	For an algorithm of our interest, what partitioning strategy fits it the best and improves its parallel execution? Is it possible to develop graph algorithms with partition transparency, such that the algorithms work under different partitions without changes? This paper aims to answer these questions.
117	Progressive Top-K Nearest Neighbors Search in Large Road Networks	Dian Ouyang, Dong Wen, Lu Qin, Lijun Chang, Ying Zhang, Xuemin Lin	In this paper, we propose a novel parameter-free index-based solution for the kNN query based on the concept of tree decomposition in large road networks.
118	Memory-Aware Framework for Efficient Second-Order Random Walk on Large Graphs	Yingxia Shao, Shiyue Huang, Xupeng Miao, Bin Cui, Lei Chen	In this paper, to clearly study the efficiency of various node sampling methods in the context of second-order random walk, we design a cost model, and then propose a new node sampling method following the acceptance-rejection paradigm to achieve a better balance between memory and time cost.
119	Hub Labeling for Shortest Path Counting	Yikai Zhang, Jeffrey Xu Yu	While many works have devoted to devising efficient distance oracles to compute the shortest distance between any vertices s and t, we study the problem of efficiently counting the number of shortest paths between s and t in light of its applications in tasks such as betweenness-related analysis.
120	CHASSIS: Conformity Meets Online Information Diffusion	Hui Li, Hui Li, Sourav S. Bhowmick	In this paper, we present a novel framework called chassis to characterize online information diffusion by bridging classical information diffusion model with conformity from social psychology.
121	Architecture-Intact Oracle for Fastest Path and Time Queries on Dynamic Spatial Networks	Victor Junqiu Wei, Raymond Chi-Wing Wong, Cheng Long	In this paper, we propose an efficient distance and path oracle on dynamic road networks using the randomization technique.
122	Data Series Progressive Similarity Search with Probabilistic Quality Guarantees	Anna Gogolou, Theophanis Tsandilas, Karima Echihabi, Anastasia Bezerianos, Themis Palpanas	We present and experimentally evaluate a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering.
123	A GPU-friendly Geometric Data Model and Algebra for Spatial Queries	Harish Doraiswamy, Juliana Freire	As a first step towards making GPU spatial query processing mainstream, we propose a new model that represents spatial data as geometric objects and define an algebra consisting of GPU-friendly composable operators that operate over these objects.
124	Debunking Four Long-Standing Misconceptions of Time-Series Distance Measures	John Paparrizos, Chunwei Liu, Aaron J. Elmore, Michael J. Franklin	Importantly, this study (i) omitted multiple distance measures, including a classic measure in the time-series literature; (ii) considered only a single time-series normalization method; and (iii) reported only raw classification error rates without statistically validating the findings, resulting in or fueling four misconceptions in the time-series literature.
125	MIRIS: Fast Object Track Queries in Video	Favyen Bastani, Songtao He, Arjun Balasingam, Karthik Gopalakrishnan, Mohammad Alizadeh, Hari Balakrishnan, Michael Cafarella, Tim Kraska, Sam Madden	We propose a novel query-driven tracking approach that integrates query processing with object tracking to efficiently process object track queries and address the computational complexity of object detection methods.
126	ACM SIGMOD Jim Gray Dissertation Award W Talk	Jose M. Faleiro	This dissertation proposes and explores the use of deterministic execution to address these concerns.
127	Effective Data Versioning for Collaborative Data Analytics	Silu Huang	In my PhD thesis, we develop solutions for versioned data management for collaborative data analytics.
128	Organizing Data Lakes for Navigation	Fatemeh Nargesian, Ken Q. Pu, Erkang Zhu, Bahar Ghadiri Bashardoost, Renée J. Miller	We present a new probabilistic model of how users interact with an organization and propose an approximate algorithm for the data lake organization problem.
129	Finding Related Tables in Data Lakes for Interactive Data Science	Yi Zhang, Zachary G. Ives	We develop search and management solutions for the Jupyter Notebook data science platform, to enable scientists to augment training data, find potential features to extract, clean data, and find joinable or linkable tables.
130	Web Data Extraction using Hybrid Program Synthesis: A Combination of Top-down and Bottom-up Inference	Mohammad Raza, Sumit Gulwani	In this work we present a novel program synthesis approach which combines the benefits of deductive and enumerative synthesis strategies, yielding a semi-supervised technique with which concise programs expressible in standard languages can be synthesized from very few examples.
131	SPARQL Rewriting: Towards Desired Results	Xun Jian, Yue Wang, Xiayu Lei, Libin Zheng, Lei Chen	Despite their hardness, we propose a (1-1/?)-approximation method for query-restricting and 2 heuristics for query-relaxing.
132	Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study	Farahnaz Akrami, Mohammed Samiul Saeef, Qingheng Zhang, Wei Hu, Chengkai Li	This paper is the first systematic study with the main objective of assessing the true effectiveness of embedding models when the unrealistic triples are removed.
133	Bitvector-aware Query Optimization for Decision Support Queries	Bailu Ding, Surajit Chaudhuri, Vivek Narasayya	In this work, we study how bitvector filters impact query optimization.
134	Efficient Join Synopsis Maintenance for Data Warehouse	Zhuoyue Zhao, Feifei Li, Yuxi Liu	Towards that end, we propose a novel algorithm SJoin that can maintain a join synopsis over a pre-specified general ?-join query in a dynamic database with continuous inflows of updates.
135	Adaptive HTAP through Elastic Resource Scheduling	Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, Anastasia Ailamaki	We propose an in-memory system design which is non-intrusive to the current state-of-art OLTP and OLAP engines, and we use it to evaluate the performance of our approach.
136	SPRINTER: A Fast n-ary Join Query Processing Method for Complex OLAP Queries	Yoon-Min Nam Nam, Donghyoung Han Han, Min-Soo Kim Kim	In this paper, we propose an effective query planning method for complex OLAP queries.
137	Rosetta: A Robust Space-Time Optimized Range Filter for Key-Value Stores	Siqiang Luo, Subarna Chatterjee, Rafael Ketsetsidis, Niv Dayan, Wilson Qin, Stratos Idreos	We introduce Rosetta, a probabilistic range filter designed specifically for LSM-tree based key-value stores.
138	RID: Deduplicating Snapshot Computations	Nikos Tsikoudis, Liuba Shrira	This paper describes the design, implementation, and performance of RID, the first language-independent optimization framework that eliminates duplicate computations in SQL programs running over low-level snapshots by exploiting snapshot metadata efficiently.
139	Architecting a Query Compiler for Spatial Workloads	Ruby Y. Tahboub, Tiark Rompf	In this paper, we discuss the underlying reasons why standard query compilation techniques are not fully effective when applied to spatial workloads, and we demonstrate how a particular style of query compilation based on techniques borrowed from partial evaluation and generative programming manages to avoid most of these difficulties by extending the scope of custom code generation into the data structures layer.
140	LISA: A Learned Index Structure for Spatial Data	Pengfei Li, Hua Lu, Qian Zheng, Long Yang, Gang Pan	We propose a novel Learned Index structure for Spatial dAta (LISA for short).
141	Effective Travel Time Estimation: When Historical Trajectories over Road Networks Matter	Haitao Yuan, Guoliang Li, Zhifeng Bao, Ling Feng	In this paper, we study the problem of origin-destination (OD) travel time estimation where the OD input consists of an OD pair and a departure time.
142	The Solution Distribution of Influence Maximization: A High-level Experimental Study on Three Algorithmic Approaches	Naoto Ohsaka	In this paper, we report a high-level experimental study on three well-established algorithmic approaches for influence maximization, referred to as Oneshot, Snapshot, and Reverse Influence Sampling (RIS).
143	Influence Maximization Revisited: Efficient Reverse Reachable Set Generation with Bound Tightened	Qintian Guo, Sibo Wang, Zhewei Wei, Ming Chen	In this paper, we present a study on this key phase and propose an efficient random RR set generation algorithm under IC model.
144	Truss-based Community Search over Large Directed Graphs	Qing Liu, Minjun Zhao, Xin Huang, Jianliang Xu, Yunjun Gao	In view of its hardness, we propose two efficient 2-approximation algorithms, named Global and Local, that run in polynomial time yet with quality guarantee.
145	Densely Connected User Community and Location Cluster Search in Location-Based Social Networks	Junghoon Kim, Tao Guo, Kaiyu Feng, Gao Cong, Arijit Khan, Farhana M. Choudhury	In this paper we propose the GeoSocial Community Search problem (GCS) which aims to find a social community and a cluster of spatial locations that are densely connected in a location-based social network simultaneously.
146	Global Reinforcement of Social Networks: The Anchored Coreness Problem	Qingyuan Linghu, Fan Zhang, Xuemin Lin, Wenjie Zhang, Ying Zhang	Since the coreness of a user has been validated as the "best practice" for capturing user engagement, we propose and study the anchored coreness problem in this paper: anchoring a small number of vertices to maximize the coreness gain (the total increment of coreness) of all the vertices in the network.
147	Confidentiality Support over Financial Grade Consortium Blockchain	Ying Yan, Changzheng Wei, Xuepeng Guo, Xuming Lu, Xiaofu Zheng, Qi Liu, Chenhui Zhou, Xuyang Song, Boran Zhao, Hui Zhang, Guofei Jiang	In this paper, we present a system design called CONFIDE to support on-chain confidentiality by leveraging Trust Execution Environment (TEE).
148	PASE: PostgreSQL Ultra-High-Dimensional Approximate Nearest Neighbor Search Extension	Wen Yang, Tao Li, Gai Fang, Hong Wei	To address these issues, we designed a novel scheme for extending the index-type of PostgreSQL (PG), which enables a similar vector search and achieves a high-performance level and strong reliability of PG.
149	Making Search Engines Faster by Lowering the Cost of Querying Business Rules Through FPGAs	Fabio Maschi, Muhsen Owaida, Gustavo Alonso, Matteo Casalino, Anthony Hock-Koon	In this paper, we focus on a real-world use case from the airline industry: determining the minimum connection time (MCT) between flights.
150	Spur: Mitigating Slow Instances in Large-Scale Streaming Pipelines	Ke Wang, Avrilia Floratou, Ashvin Agrawal, Daniel Musgrave	In this paper, we highlight some of the unique challenges imposed by this large scale of operation: other concurrent workloads sharing the cluster may cause random performance deterioration; unavailability of external dependencies may cause temporary stalls in the pipeline; scarcity in the underlying resource manager may cause arbitrarily long delays or rejection of container allocation requests.
151	Entity Matching in the Wild: A Consistent and Versatile Framework to Unify Data in Industrial Applications	Yan Yan, Stephen Meyles, Aria Haghighi, Dan Suciu	In this work, we describe Amperity’s entity matching framework, Fusion, and how its design provides solutions to these challenges.
152	QueryVis: Logic-based Diagrams help Users Understand Complicated SQL Queries Faster	Aristotelis Leventidis, Jiahui Zhang, Cody Dunne, Wolfgang Gatterbauer, H.V. Jagadish, Mirek Riedewald	We present initial steps in that direction with visual diagrams that are based on the first-order logic foundation of SQL and can capture the meaning of deeply nested queries.
153	Duoquest: A Dual-Specification System for Expressive SQL Queries	Christopher Baik, Zhongjun Jin, Michael Cafarella, H. V. Jagadish	Consequently, we propose dual-specification query synthesis, which consumes both a NLQ and an optional PBE-like table sketch query that enables users to express varied levels of domain knowledge.
154	SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns	Prashanth Dintyala, Arpit Narechania, Joy Arulraj	In this paper, we present SQLCheck, a holistic toolchain for automatically finding and fixing anti-patterns in database applications.
155	DBPal: A Fully Pluggable NL2SQL Training Pipeline	Nathaniel Weir, Prasetya Utama, Alex Galakatos, Andrew Crotty, Amir Ilkhechi, Shekar Ramaswamy, Rohin Bhushan, Nadja Geisler, Benjamin Hättasch, Steffen Eger, Ugur Cetintemel, Carsten Binnig	Based on these observations, we propose DBPal, a new approach that augments existing deep learning techniques in order to improve the performance of models for natural language to SQL translation.
156	SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data	Vraj Shah, Side Li, Arun Kumar, Lawrence Saul	In this work, we propose to bridge this gap by designing a speech-driven querying system and interface for structured data we call SpeakQL. We present the first dataset of spoken SQL queries and a generic approach to generate them for any arbitrary schema.
157	Near-Optimal Distributed Band-Joins through Recursive Partitioning	Rundong Li, Wolfgang Gatterbauer, Mirek Riedewald	Our main insight is that recursive partitioning of the join-attribute space with the appropriate split scoring measure can achieve both low optimization cost and low join cost.
158	ChronoCache: Predictive and Adaptive Mid-Tier Query Result Caching	Brad Glasbergen, Kyle Langendoen, Michael Abebe, Khuzaima Daudjee	In this paper we present ChronoCache, a mid-tier caching system that exploits the presence of geo-distributed edge nodes to cache database query results closer to users.
159	Cheetah: Accelerating Database Queries with Switch Pruning	Muhammad Tirmazi, Ran Ben Basat, Jiaqi Gao, Minlan Yu	In this paper, we leverage programmable switches in the network to partially offload query computation to the switch.
160	External Merge Sort for Top-K Queries: Eager input filtering guided by histograms	Yannis Chronis, Thanh Do, Goetz Graefe, Keith Peters	To address these challenges, we introduce a new top-k algorithm that is able to eliminate parts of the input before sorting or writing them to secondary storage, regardless of whether the requested output fits in the available memory.
161	Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing	Qiange Wang, Yanfeng Zhang, Hao Wang, Liang Geng, Rubao Lee, Xiaodong Zhang, Ge Yu	In this paper, we lay an analytical foundation for conditions to check if a recursive aggregate program that is monotonic or even non-monotonic can be executed incrementally and asynchronously with its correct result.
162	Prompt: Dynamic Data-Partitioning for Distributed Micro-batch Stream Processing Systems	Ahmed S. Abdelhamid, Ahmed R. Mahmood, Anas Daghistani, Walid G. Aref	Because achieving optimal data partitioning is NP-Hard in this context, a workload-aware greedy algorithm is introduced that partitions the buffered data tuples efficiently for the Map stage.
163	Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines	Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, Volker Markl	In this paper, we propose Rhino, a library for efficient reconfigurations of running queries in the presence of very large distributed state.
164	Grizzly: Efficient Stream Processing Through Adaptive Query Compilation	Philipp M. Grulich, Breß Sebastian, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl	In this paper, we present Grizzly, a novel adaptive query compilation-based SPE, to enable highly efficient query execution.
165	LightSaber: Efficient Window Aggregation on Multi-core Processors	Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, Holger Pirk	Based on this, we introduce LightSaber, a new stream processing engine that balances parallelism and incremental processing when executing window aggregation queries on multi-core CPUs.
166	Parallel Index-based Stream Join on a Multicore CPU	Amirhesam Shahvarani, Hans-Arno Jacobsen	In this paper, we introduce an index data structure, called the partitioned in-memory merge tree, to address the challenges that arise when indexing highly dynamic data, which are common in streaming settings.
167	Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination	Conglong Li, Minjia Zhang, David G. Andersen, Yuxiong He	To achieve a better tradeoff between latency and accuracy, we propose a novel approach that adaptively determines search termination conditions for individual queries.
168	Theoretically-Efficient and Practical Parallel DBSCAN	Yiqiu Wang, Yan Gu, Julian Shun	This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth).
169	A Relational Matrix Algebra and its Implementation in a Column Store	Oksana Dolmatova, Nikolaus Augsten, Michael H. Böhlen	This paper proposes a principled solution at the logical level.
170	Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring	Yifan Lei, Qiang Huang, Mohan Kankanhalli, Anthony K. H. Tung	In this paper, we propose a novel LSH scheme based on the Longest Circular Co-Substring (LCCS) search framework (LCCS-LSH) with a theoretical guarantee.
171	Continuously Adaptive Similarity Search	Huayi Zhang, Lei Cao, Yizhou Yan, Samuel Madden, Elke A. Rundensteiner	In this paper, we propose the first solution, called OASIS, to instantaneously adapt the index to conform to a changing distance metric without this prohibitive re-indexing process.
172	Automating Exploratory Data Analysis via Machine Learning: An Overview	Tova Milo, Amit Somech	In this tutorial, we review recent lines of work for automating EDA.
173	Crowdsourcing Practice for Efficient Data Labeling: Aggregation, Incremental Relabeling, and Pricing	Alexey Drutsa, Valentina Fedorova, Dmitry Ustalov, Olga Megorskaya, Evfrosiniya Zerminova, Daria Baidakova	In this tutorial, we present a portion of unique industry experience in efficient data labeling via crowdsourcing shared by both leading researchers and engineers from Yandex.
174	State of the Art and Open Challenges in Natural Language Interfaces to Data	Fatma ?zcan, Abdul Quamar, Jaydeep Sen, Chuan Lei, Vasilis Efthymiou	In this tutorial, we will review these natural language interface solutions in terms of their interpretation approach, as well as the complexity of the queries they can generate.
175	SIGMOD 2020 Tutorial on Fairness and Bias in Peer Review and Other Sociotechnical Intelligent Systems	Nihar B. Shah, Zachary Lipton	Our presentation will cover a wide range of disciplinary perspectives with the first part focusing on the social impacts of technology and the formulations of fairness and bias defined via protected characteristics and the second part taking a deep into peer review and distributed human evaluations, to explore other forms of bias, such as that due to subjectivity, miscalibration, and dishonest behavior.
176	Le Taureau: Deconstructing the Serverless Landscape & A Look Forward	Anurag Khandelwal, Arun Kejariwal, Karthikeyan Ramasamy	Inspired by Picasso’s Le Taureau, in the tutorial proposed herein, we shall deconstruct evolution of serverless — the overarching intent being to facilitate better understanding of the serverless landscape.
177	Beyond Analytics: The Evolution of Stream Processing Systems	Paris Carbone, Marios Fragkoulis, Vasiliki Kalavri, Asterios Katsifodimos	The goal of this tutorial is threefold. First, we aim to review and highlight noteworthy past research findings, which were largely ignored until very recently. Second, we intend to underline the differences between early (’00-’10) and modern (’11-’18) streaming systems, and how those systems have evolved through the years. Most importantly, we wish to turn the attention of the database community to recent trends: streaming systems are no longer used only for classic stream processing workloads, namely window aggregates and joins.
178	Optimal Join Algorithms Meet Top-k	Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald	This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient ranked enumeration of join-query results.
179	Key-Value Storage Engines	Stratos Idreos, Mark Callaghan	In this tutorial, we survey the state-of-the-art approaches on how the core storage engine of a key-value store system is designed.
180	RASQL: A Powerful Language and its System for Big Data Applications	Jin Wang, Guorui Xiao, Jiaqi Gu, Jiacheng Wu, Carlo Zaniolo	To this end, we propose the Recursive-aggregate-SQL (RASQL) language and its system on top of Apache Spark to express and execute complex queries and declarative algorithms in many applications, such as graph search and machine learning.
181	PL/SQL Without the PL	Denis Hirn, Torsten Grust	We demonstrate a source-to-source compilation technique that can translate user-defined PL/SQL functions into plain SQL queries.
182	Analysis of Database Search Systems with THOR	Theofilos Belmpas, Orest Gkini, Georgia Koutrika	To help towards this direction, we present THOR that makes 4 important contributions: a query benchmark, a framework for comparing different systems, several search system implementations, and a highly interactive tool for comparing different search systems.
183	BOOMER: A Tool for Blending Visual P-Homomorphic Queries on Large Networks	Yinglong Song, Huey Eng Chua, Sourav S. Bhowmick, Byron Choi, Shuigeng Zhou	In this demonstration, we present a novel system called BOOMER to realize this paradigm on more generic but complex bounded 1-1 p-homomorphic(BPH) queries on large networks.
184	AURORA: Data-driven Construction of Visual Graph Query Interfaces for Graph Databases	Sourav S. Bhowmick, Kai Huang, Huey Eng Chua, Zifeng Yuan, Byron Choi, Shuigeng Zhou	In this demonstration, we present a novel data-driven visual subgraph query interface construction engine called AURORA.
185	vChain: A Blockchain System Ensuring Query Integrity	Haixin Wang, Cheng Xu, Ce Zhang, Jianliang Xu	We demonstrate its verifiable query operations, usability, and performance with visualization for better insights.
186	AUDITOR: A System Designed for Automatic Discovery of Complex Integrity Constraints in Relational Databases	Wentao Hu, Dongxiang Zhang, Dawei Jiang, Sai Wu, Ke Chen, Kian-Lee Tan, Gang Chen	In this demonstration, we present a new definition of integrity constraint that is more powerful for anomalous data discovery.
187	SHARQL: Shape Analysis of Recursive SPARQL Queries	Angela Bonifati, Wim Martens, Thomas Timm	In SHARQL, we show how the analysis and exploration of several hundred million queries is possible.
188	High Performance Distributed OLAP on Property Graphs with Grasper	Hongzhi Chen, Bowen Wu, Shiyuan Deng, Chenghuan Huang, Changji Li, Yichao Li, James Cheng	This Demo presents Grasper, an RDMA-enabled distributed graph OLAP system, which adopts a series of new system designs to overcome the challenges of OLAP on graphs.
189	ProcAnalyzer: Effective Code Analyzer for Tuning Imperative Programs in SAP HANA	Kisung Park, Taeyoung Jeong, Chanho Jeong, Jaeha Lee, Dong-Hun Lee, Young-Koo Lee	In this demonstration, we present ProcAnalyzer, an expressive and intuitive tool for troubleshooting issues related to performance, code quality, and security.
190	LATTE: Visual Construction of Smart Contracts	Sean Tan, Sourav S Bhowmick, Huey Eng Chua, Xiaokui Xiao	In this demonstration, we present a novel visual smart contract construction system on Ethereum called latte to make smart contract development accessible to non-programmers.
191	PROUD: PaRallel OUtlier Detection for Streams	Theodoros Toliopoulos, Christos Bellas, Anastasios Gounaris, Apostolos Papadopoulos	We introduce PROUD, standing for PaRallel OUtlier Detection for streams, which is an extensible engine for continuous multi-parameter parallel distance-based outlier (or anomaly) detection tailored to big data streams.
192	MithraCoverage: A System for Investigating Population Bias for Intersectional Fairness	Zhongjun Jin, Mengjing Xu, Chenkai Sun, Abolfazl Asudeh, H. V. Jagadish	We demonstrate MithraCoverage, a system for investigating population bias over the intersection of multiple attributes.
193	MC3: A System for Minimization of Classifier Construction Cost	Shay Gershtein, Tova Milo, Gefen Morami, Slava Novgorodov	In this demo, we introduce MC3, a real-time system that helps data analysts decide which classifiers to construct to minimize the costs of answering a set of search queries.
194	Sentinel: Understanding Data Systems	Brad Glasbergen, Michael Abebe, Khuzaima Daudjee, Daniel Vogel, Jian Zhao	We demonstrate the Sentinel system, which enables administrators to analyze systems and applications by building models of system execution and comparing them to derive key differences in behaviour.
195	BugDoc: A System for Debugging Computational Pipelines	Raoni Lourenço, Juliana Freire, Dennis Shasha	We recently proposed a new approach that makes provenance to automatically and iteratively infer root causes and derive succinct explanations of failures; such an approach was implemented in our prototype, BugDoc.
196	TQVS: Temporal Queries over Video Streams in Action	Yueting Chen, Xiaohui Yu, Nick Koudas	We present TQVS, a system capable of conducting efficient evaluation of declarative temporal queries over real-time video streams.
197	ExTuNe: Explaining Tuple Non-conformance	Anna Fariha, Ashish Tiwari, Arjun Radhakrishna, Sumit Gulwani	We present ExTuNe, a system for Explaining causes of Tuple Non-conformance.
198	Interactively Discovering and Ranking Desired Tuples without Writing SQL Queries	Xuedi Qin, Chengliang Chai, Yuyu Luo, Nan Tang, Guoliang Li	We propose to demonstrate such as a system, namely DExPlorer.
199	Synner: Generating Realistic Synthetic Data	Miro Mannino, Azza Abouzied	Synner provides instant feedback on every user interaction by visualizing a preview of the generated data.
200	InCognitoMatch: Cognitive-aware Matching via Crowdsourcing	Roee Shraga, Coral Scharf, Rakefet Ackerman, Avigdor Gal	We present InCognitoMatch, the first cognitive-aware crowdsourcing application for matching tasks.
201	CoClean: Collaborative Data Cleaning	Mashaal Musleh, Mourad Ouzzani, Nan Tang, AnHai Doan	We propose a crowd-in-the-loop cleaning system, called CoClean, built on top of Python Pandas dataframe, a widely used library for data scientists.
202	STAR: A Distributed Stream Warehouse System for Spatial Data	Zhida Chen, Gao Cong, Walid G. Aref	In this demonstration, we present the STAR (Spatial Data Stream Warehouse) system.
203	T-REx: Table Repair Explanations	Daniel Deutch, Nave Frost, Amir Gilad, Oren Sheffer	To assist users in understanding the output of such data repair algorithms, we propose T-REx, a system for providing data repair explanations through Shapley values.
204	SVQ++: Querying for Object Interactions in Video Streams	Daren Chao, Nick Koudas, Ioannis Xarchakos	We demonstrate that this system can efficiently identify frames in a streaming video in which an object is interacting with another in a specific way, increasing the frame processing rate dramatically and speed up query processing by at least two orders of magnitude depending on the query.
205	F-IVM: Learning over Fast-Evolving Relational Data	Milos Nikolic, Haozhe Zhang, Ahmet Kara, Dan Olteanu	We will demonstrate F-IVM for three such applications: model selection, Chow-Liu trees, and ridge linear regression.
206	CoMing: A Real-time Co-Movement Mining System for Streaming Trajectories	Ziquan Fang, Yunjun Gao, Lu Pan, Lu Chen, Xiaoye Miao, Christian S. Jensen	To this end, we develop CoMing, a real-time co-movement pattern mining system, to handle streaming trajectories.
207	Unified Spatial Analytics from Heterogeneous Sources with Amazon Redshift	Nemanja Bori?, Hinnerk Gildhoff, Menelaos Karavelas, Ippokratis Pandis, Ioanna Tsalouchidou	In this demonstration we present the spatial functionality of Amazon Redshift and its integration with other Amazon services, such as Amazon Aurora PostgreSQL and Amazon S3.
208	Big Data Series Analytics Using TARDIS and its Exploitation in Geospatial Applications	Liang Zhang, Noura Alghamdi, Mohamed Y. Eltabakh, Elke A. Rundensteiner	In this demonstration, we present GENET, a new interactive exploration demonstration that allows users to support Big Data Series Approximate Retrieval and Recursive Interactive Clustering in large-scale geospatial datasets using TARDIS index techniques.
209	CDFShop: Exploring and Optimizing Learned Index Structures	Ryan Marcus, Emily Zhang, Tim Kraska	This demonstration allows audience members to (1) gain an intuition about various tuning parameters of RMIs and why learned index structures can greatly accelerate search, and (2) understand how automatic optimization techniques can be used to explore space/time tradeoffs within the space of RMIs.
210	TensorFlow Data Validation: Data Analysis and Validation in Continuous ML Pipelines	Emily Caveness, Paul Suganthan G. C., Zhuo Peng, Neoklis Polyzotis, Sudip Roy, Martin Zinkevich	In this demonstration we showcase TensorFlow Data Validation (TFDV), a scalable data analysis and validation system for ML that we have developed at Google and recently open-sourced.
211	Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing	Zuozhi Wang, Kai Zeng, Botong Huang, Wei Chen, Xiaozong Cui, Bo Wang, Ji Liu, Liya Fan, Dachuan Qu, Zhenyu Hou, Tao Guan, Chen Li, Jingren Zhou	In this paper, we present Grosbeak, a novel data warehouse that supports resource-aware incremental computing to process recurring routine jobs, smooths the resource skew, and optimizes the resource usage.
212	Demonstration of BitGourmet: Data Analysis via Deterministic Approximation	Saehan Jo, Immanuel Trummer	We demonstrate BitGourmet, a novel data analysis system that supports deterministic approximate query processing (DAQ).
213	Bring Your Own Data to X-PLAIN	Eliana Pastor, Elena Baralis	X-PLAIN is an interactive tool that allows human-in-the-loop inspection of the reasons behind model predictions.
214	Physical Visualization Design	Lana Ramjit, Zhaoning Kong, Ravi Netravali, Eugene Wu	We demonstrate PVD, a system that visualization designers can use to co-design the interface and system architecture of scalable and expressive visualization.
215	Demonstration of Chestnut: An In-memory Data Layout Designer for Database Applications	Mingwei Samuel, Cong Yan, Alvin Cheung	Demonstration of Chestnut: An In-memory Data Layout Designer for Database Applications
216	Breaking Down Memory Walls in LSM-based Storage Systems	Chen Luo	Breaking Down Memory Walls in LSM-based Storage Systems
217	Context-Free Path Querying via Matrix Equations	Yuliya Susanina	We show how to reduce GFPQ evaluation to solving systems of matrix equations over R — a problem for which there exist high-performance solutions.
218	Simulation-based Approximate Graph Pattern Matching	Xiaoshuang Chen	In this paper, we propose a simulation-based approximate pattern matching algorithm that is not only efficient to compute, but also able to capture those reasonable matches (missed by existing algorithms)
219	High-Dimensional Vector Similarity Search: From Time Series to Deep Network Embeddings	Karima Echihabi	High-Dimensional Vector Similarity Search: From Time Series to Deep Network Embeddings
220	Rethinking Message Brokers on RDMA and NVM	Hendrik Makait	Rethinking Message Brokers on RDMA and NVM
221	Monte Carlo Tree Search for Generating Interactive Data Analysis Interfaces	Yiru Chen	We propose to adopt Monte Carlo Tree Search(MCTS) to search for the optimal interface that accounts for hierarchical layout as well as the usability in terms of how easy to express the query log.
222	Continuous Prefetch for Interactive Data Applications	Haneen Mohammed	Continuous Prefetch for Interactive Data Applications
223	Re-evaluating the Performance Trade-offs for Hash-Based Multi-Join Queries	Shiva Jahangiri	Re-evaluating the Performance Trade-offs for Hash-Based Multi-Join Queries
224	Interactive View Recommendation	Xiaozhong Zhang	This paper presents an attempt towards interactive view recommendation that automatically discovers the utility function composition during an exploration that best matches the user’s intentions and exploration task.
225	From Worst-Case to Average-Case Analysis: Accurate Latency Predictions for Key-Value Storage Engines	Meena Jagadeesan, Garrett Tanzer	In this work, we start to develop an average-case analysis of the performance of storage engines that can achieve significantly more accurate predictions than existing worst-case models.
226	Towards the Scheduling of Vertex-constrained Multi Subgraph Matching Query	Kongzhang Hao, Longbin Lai	In this paper, we study the problem of vertex-constrained multi subgraph matching query (vMSQ), where we propose a novel scheduling algorithm for processing multiple queries in parallel, while taking into considerations of load balance and maximum possible sharing of computation.
227	Serverless Query Processing on a Budget	William Ma	We propose a model that will allow service providers to dynamically provision clusters to achieve their users’ desired time-cost tradeoffs.
228	Workload-Aware Column Imprints	Noah Slavitch	We propose efficient algorithms to construct our data structures.
229	Towards Scalable UDTFs in Noria	Justus Adam	In this work we design single-tuple UDF and User Defined Aggregates (UDA) interfaces for Noria, a state-of-the art dataflow system with incremental materialized views.
230	Column Partition and Permutation for Run Length Encoding in Columnar Databases	Jia Shi	In this paper, we consider compressing columns using the Run Length Encoding (RLE).
231	Supporting Database Constraints in Synthetic Data Generation based on Generative Adversarial Networks	Wanxin Li	In our research, we focus on data synthesization for relational databases where the database constraints of the original data must be imposed to the generated data.
232	An Evaluation of Methods of Compressing Doubles	Jacob Spiegel	In this paper, we perform such a comparison of methods and evaluate their performance in terms of compression ratio and throughput achieved across two dataset repositories of time series and featurized machine-learning problems, as well as on a dataset of machine logs.
233	MemFlow: Memory-Aware Distributed Deep Learning	Neil Band	Towards this we introduce MemFlow, an optimization framework for distributed deep learning that performs joint optimization over memory usage and computation time when searching for a parallelization strategy.
234	JSON Schema Matching: Empirical Observations	Kunal Waghray	JSON Schema Matching: Empirical Observations