Paper Digest: SIGMOD 2025 Papers & Highlights
Interested users can choose to read all SIGMOD-2025 papers in our digest console, which supports more features.
To search for papers presented at SIGMOD-2025 on a specific topic, please make use of the search by venue (SIGMOD-2025) service. To summarize the latest research published at SIGMOD-2025 on a specific topic, you can utilize the review by venue (SIGMOD-2025) service. To synthesizes the findings from SIGMOD 2025 into comprehensive reports, give a try to SIGMOD-2025 Research. If you are interested in browsing papers by author, we have a comprehensive list of all SIGMOD-2025 authors & their papers.
This curated list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that gets you the personalized and comprehensive updates on the latest research in your field. It also empowers you to read articles, write articles, get answers, conduct literature reviews and generate research reports.
Experience the full potential of our services today!
TABLE 1: Paper Digest: SIGMOD 2025 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | A Lov\'{a}sz-Simonovits Theorem for Hypergraphs with Application to Local Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first analysis of diffusion on hypergraphs based on the Lov\'{a}sz-Simonovits theory. |
Raj Kamal; Amitabha Bagchi; |
| 2 | A Profit-Maximizing Data Marketplace with Differentially Private Federated Learning Under Price Competition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel DPFL-based data marketplace that accommodates both price-taking and price-setting data owners. |
Peng Sun; Liantao Wu; Zhibo Wang; Jinfei Liu; Juan Luo; Wenqiang Jin; |
| 3 | Adaptive Quotient Filters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design and implement the sysname, the first practical adaptive filter with minimal adaptivity overhead and strong adaptivity guarantees, which means that the performance and false-positive guarantees continue to hold even for adversarial workloads. |
Richard Wen; Hunter McCoy; David Tench; Guido Tagliavini; Michael A. Bender; Alex Conway; Martin Farach-Colton; Rob Johnson; Prashant Pandey; |
| 4 | Atom: An Efficient Query Serving System for Embedding-based Knowledge Graph Reasoning with Operator-level Batching Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Knowledge graph reasoning (KGR) answers logical queries over a knowledge graph (KG), and embedding-based KGR (EKGR) becomes popular recently, which embeds both queries and KG … |
Qihui Zhou; Peiqi Yin; Xiao Yan; Changji Li; Guanxian Jiang; James Cheng; |
| 5 | BT-Tree: A Reinforcement Learning Based Index for Big Trajectory Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose BT-Tree, which is built through a recursive bi-partitioning approach, for the processing of range and KNN queries for past trajectory data. |
Tu Gu; Kaiyu Feng; Jingyi Yang; Gao Cong; Cheng Long; Rui Zhang; |
| 6 | Discovering Top-k Relevant and Diversified Rules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We train a relevance model to learn users’ prior knowledge, rank rules based on users’ need, and propose four diversity measures to assess the diversity between rules. Based on these measures, we formulate a new discovery problem. |
Wenfei Fan; Ziyan Han; Min Xie; Guangyi Zhang; |
| 7 | Efficient and Accurate PageRank Approximation on Large Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Experiment results on three large-scale graphs show that both the CUR-Trans algorithm and the T 2-Approx algorithm achieve the lowest response time for computing PageRank values with the best accuracy (for the CUR-Trans algorithm) or the competitive accuracy (for the T 2-Approx algorithm). |
Siyue Wu; Dingming Wu; Junyi Quan; Tsz Nam Chan; Kezhong Lu; |
| 8 | Efficient Approximation Algorithms for Minimum Cost Seed Selection with Probabilistic Coverage Guarantee Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With MRR in hand, we propose our algorithm SCORE for MCSS-PCG, whose performance guarantee is derived by measuring the gap between MCSS-ECG and MCSS-PCG, and applying the theoretical results in MCSS-ECG. |
Chen Feng; Xingguang Chen; Qintian Guo; Fangyuan Zhang; Sibo Wang; |
| 9 | Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when dealing with stream data, due to the need for real-time processing and high-quality analysis, methods developed for processing static data become unavailable. Consequently, a fundamental question arises: Is it possible to achieve adaptive sampling in stream data without relying on offline techniques?To address this problem, we propose FreeSam, which couples hybrid sampling with intra-window join, a key stream join operator. |
Xilin Tang; Feng Zhang; Shuhao Zhang; Yani Liu; Bingsheng He; Bingsheng He; Xiaoyong Du; Xiaoyong Du; |
| 10 | GABoost: Graph Alignment Boosting Via Local Optimum Escape Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose GABoost, a graph alignment boosting algorithm that takes as input an initial alignment between two heterogeneous graphs and outputs a boosted alignment via an iterative local-optimum-escape process. |
Wei Liu; Wei Zhang; Haiyan Zhao; Zhi Jin; |
| 11 | Near-Duplicate Text Alignment with One Permutation Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the space cost O(nk) is still too high for long texts, especially when the sketch size k is large. To address this issue, we propose to use One Permutation Hashing (OPH) to generate the min-hash sketch and introduce the concept of OPH compact windows. |
Zhencan Peng; Yuheng Zhang; Dong Deng; |
| 12 | On The Feasibility and Benefits of Extensive Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: 3) We have not found a method that can consistently outperform random sampling + ANOVA. |
Yujie Hui; Miao Yu; Hao Qi; Yifan Gan; Tianxi Li; Yuke Li; Xueyuan Ren; Sixiang Ma; Xiaoyi Lu; Yang Wang; |
| 13 | CAMAL: Optimizing LSM-trees Via Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new approach CAMAL, which boasts the following features: (1) ML-Aided: CAMAL is the first attempt to apply active learning to tune LSM-tree based key-value stores. |
Weiping Yu; Siqiang Luo; Zihao Yu; Gao Cong; |
| 14 | Pluto: Sample Selection for Robust Anomaly Detection on Polluted Log Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we thus propose a robust log anomaly detection framework, PlutoNOSPACE, that automatically selects a clean representative sample subset of the polluted log sequence data to train a Transformer-based anomaly detection model. |
Lei Ma; Lei Cao; Peter M. VanNostrand; Dennis M. Hofmann; Yao Su; Elke A. Rundensteiner; |
| 15 | SketchQL: Video Moment Querying with A Visual Query Interface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a learned similarity search algorithm for retrieving video moments closely matching the user’s visual query based on object trajectories. |
Renzhi Wu; Pramod Chunduri; Ali Payani; Xu Chu; Joy Arulraj; Kexin Rong; |
| 16 | Tao: Improving Resource Utilization While Guaranteeing SLO in Multi-tenant Relational Database-as-a-Service Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel system Tao to overcome it. |
Haotian Liu; Runzhong Li; Ziyang Zhang; Bo Tang; |
| 17 | Theoretically and Practically Efficient Maximum Defective Clique Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, determining the maximum k-defective clique in graphs has been proven to be an NP-hard problem, presenting significant challenges in finding an efficient solution. To address this problem, we develop a theoretically and practically efficient algorithm that leverages newly-designed branch reduction rules and a pivot-based branching technique. |
Qiangqiang Dai; Ronghua Li; Donghang Cui; Guoren Wang; |
| 18 | A Universal Sketch for Estimating Heavy Hitters and Per-Element Frequency Moments in Data Streams with Bounded Deletions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make a bounded deletion assumption, putting a constraint on the number of deletions allowed. |
Liang Zheng; Qingjun Xiao; Xuyuan Cai; |
| 19 | An Efficient and Exact Algorithm for Locally H-Clique Densest Subgraph Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates the LhCDS detection problem and proposes an efficient and exact algorithm to list the top-k non-overlapping, locally h-clique dense, and compact subgraphs. |
Xiaojia Xu; Haoyu Liu; Xiaowei Lv; Yongcai Wang; Deying Li; |
| 20 | Buffered Persistence in B+ Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The overhead is particularly pronounced in workloads that benefit from cache reuse due to good temporal locality or small working sets—traits commonly observed in real-world applications.In this paper, we propose a buffered durable B+ tree (BD+Tree) that improves performance and reduces NVM traffic via relaxed persistence. |
Mingzhe Du; Michael L. Scott; |
| 21 | Camel: Efficient Compression of Floating-Point Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Camel, a new compression method for floating-point time series with the goal of advancing the compression ratios and efficiency achievable. |
Yuanyuan Yao; Lu Chen; Ziquan Fang; Yunjun Gao; Christian S. Jensen; Tianyi Li; |
| 22 | Common Neighborhood Estimation Over Bipartite Graphs Under Local Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To obtain efficient and accurate estimates, we propose a multiple-round framework that significantly reduces the candidate pool of common neighbors and enables the query vertices to construct unbiased estimators locally. |
Yizhang He; Kai Wang; Wenjie Zhang; Xuemin Lin; Ying Zhang; |
| 23 | Connectivity-Oriented Property Graph Partitioning for Distributed Graph Pattern Query Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Identifying these matches requires much inter-partition communication, which is the primary performance bottleneck in distributed query processing. To address this issue, this paper introduces a novel connectivity-oriented relationship-disjoint partitioning method, namely RCP (Relationship Connectivity Partitioning), aimed at enhancing the efficiency of graph pattern query processing by reducing crossing matches. |
Min Shi; Peng Peng; Xu Zhou; Jiayu Liu; Guoqing Xiao; Kenli Li; |
| 24 | Constant-time Connectivity Querying in Dynamic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the efficiency, we propose a new spanning-tree-based solution by maintaining a disjoint-set tree simultaneously. |
Lantian Xu; Dong Wen; Lu Qin; Ronghua Li; Ying Zhang; Xuemin Lin; |
| 25 | CtxPipe: Context-aware Data Preparation Pipeline Construction for Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CtxPipe, a novel framework that addresses the limitations of previous works by leveraging contextual information to improve the pipeline construction process. |
Haotian Gao; Shaofeng Cai; Tien Tuan Anh Dinh; Zhiyong Huang; Beng Chin Ooi; |
| 26 | Directional Queries: Making Top-k Queries More Effective in Discovering Relevant Results Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Their major advantage over alternative approaches, such as skyline queries (which return all the undominated objects in a dataset), is that the cardinality of the output can be easily controlled through the k parameter and user preferences can be accommodated by appropriately weighing the involved attributes.In this paper we concentrate on two so-far neglected aspects of top-k queries: first, their general ability to return all the potentially interesting results, i.e., the tuples in the skyline; second, the difficulty that linear top-k queries might encounter in returning tuples with balanced attribute values that match user preferences more closely than tuples that are extremely good in one dimension but (very) poor in others. In order to quantify these undesirable effects we introduce four novel indicators for skyline tuples, which measure their robustness as well as the difficulty incurred by top-k queries to retrieve them.After observing that real datasets usually contain many relevant results that are hardly retrievable by linear top-k queries, and with the aim of favoring balanced results, we extend the queries with a term that accounts for the distance of a tuple from the preference direction established by the attributes’ weights. |
Paolo Ciaccia; Davide Martinenghi; |
| 27 | Disclosure-Compliant Query Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce data masks to specify disclosure policies flexibly and intuitively and propose a query modification approach to rewrite user queries into disclosure-compliant ones. |
Rudi Poepsel-Lemaitre; Kaustubh Beedkar; Volker Markl; |
| 28 | DPconv: Super-Polynomially Faster Join Ordering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a new algorithmic framework based on subset convolution. |
Mihail Stoian; Andreas Kipf; |
| 29 | Finding Logic Bugs in Spatial Database Engines Via Affine Equivalent Inputs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an automated geometry-aware generator to generate high-quality SQL statements for SDBMSs and a novel concept named Affine Equivalent Inputs (AEI) to validate the results of SDBMSs. |
Wenjing Deng; Qiuyang Mang; Chengyu Zhang; Manuel Rigger; |
| 30 | GIDCL: A Graph-Enhanced Interpretable Data Cleaning Framework with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces GIDCL (Graph-enhanced Interpretable Data Cleaning with Large language models), a pioneering framework that harnesses the capabilities of Large Language Models (LLMs) alongside Graph Neural Network (GNN) to address the challenges of traditional and machine learning-based data cleaning methods. |
Mengyi Yan; Yaoshu Wang; Yue Wang; Xiaoye Miao; Jianxin Li; |
| 31 | GOLAP: A GPU-in-Data-Path Architecture for High-Speed OLAP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we suggest a novel GPU-in-data-path architecture that leverages a GPU to accelerate the I/O path and thus can achieve almost in-memory bandwidth using SSDs. |
Nils Boeschen; Tobias Ziegler; Carsten Binnig; |
| 32 | High-Performance Query Processing with NVMe Arrays: Spilling Without Killing Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to bridge the gap between fast in-memory query engines and slow but robust engines that can utilize external storage. |
Maximilian Kuschewski; Jana Giceva; Thomas Neumann; Viktor Leis; |
| 33 | IRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, instead of materializing a compressed index for every possible query range in preparation for querying, we materialize graph-based indexes, called elemental graphs, for a moderate number of ranges. |
Yuexuan Xu; Jianyang Gao; Yutong Gou; Cheng Long; Christian S. Jensen; |
| 34 | Live Patching for Distributed In-Memory Key-Value Stores Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we propose applying software updates directly in memory without restarting any nodes. |
Michael Fruth; Stefanie Scherzinger; |
| 35 | Transforming RDF Graphs to Property Graphs Using Standardized Schemas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance the interoperability of the two models, we present a novel technique, S3PG, to convert RDF knowledge graphs into property graphs exploiting two popular standards to express schema constraints, i.e., SHACL for RDF and PG-Schema for property graphs. |
Kashif Rabbani; Matteo Lissandrini; Angela Bonifati; Katja Hose; |
| 36 | LSMGraph: A High-Performance Dynamic Graph Storage System with Multi-Level CSR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing dynamic graph storage systems suffer from read or write amplification and face the challenge of optimizing both read and write performance simultaneously. To address this challenge, we propose LSMGraph, a novel dynamic graph storage system that combines the write-friendly LSM-tree and the read-friendly CSR. |
Song Yu; Shufeng Gong; Qian Tao; Sijie Shen; Yanfeng Zhang; Wenyuan Yu; Pengxi Liu; Zhixin Zhang; Hongfu Li; Xiaojian Luo; Ge Yu; Jingren Zhou; |
| 37 | Memento Filter: A Fast, Dynamic, and Robust Range Filter Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Range filters are probabilistic data structures that answer approximate range emptiness queries. They aid in avoiding processing empty range queries and have use cases in many … |
Navid Eslami; Niv Dayan; |
| 38 | Multivariate Time Series Cleaning Under Speed Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MTCSC, the constraint-based method for cleaning multivariate time series. |
Aoqian Zhang; Zexue Wu; Yifeng Gong; Ye Yuan; Guoren Wang; |
| 39 | Navigating Labels and Vectors: A Unified Approach to Filtered Approximate Nearest Neighbor Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Effectively incorporating filtering conditions with vector similarity presents significant challenges, including index for dynamically filtered search space, agnostic query labels, computational overhead for label-irrelevant vectors, and potential inadequacy in returning results. To tackle these challenges, we introduce a novel approach called the Label Navigating Graph, which encodes the containment relationships of label sets for all vectors. |
Yuzheng Cai; Jiayang Shi; Yizhuo Chen; Weiguo Zheng; |
| 40 | Online Detection of Anomalies in Temporal Knowledge Graphs with Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, these methods falter in adapting to pattern changes and semantic drifts resulting from knowledge updates. To tackle these challenges, we introduce AnoT, an efficient TKG summarization method tailored for interpretable online anomaly detection in TKGs. |
Jiasheng Zhang; Rex Ying; Jie Shao; |
| 41 | Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically study the problem of scheduling a workflow DAG for pipelined execution, and develop a novel cost-based optimizer called Pasta for generating a high-quality schedule. |
Xiaozhen Liu; Yicong Huang; Xinyuan Lin; Avinash Kumar; Sadeem Alsudais; Chen Li; |
| 42 | Personalized Truncation for Personalized Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing techniques for PDP cannot provide good utility for many fundamental problems such as basic counting and sum estimation. In this paper, we present the personalized truncation mechanism for these problems under PDP. |
Dajun Sun; Wei Dong; Yuan Qiu; Ke Yi; |
| 43 | Provenance-Enabled Explainable AI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce provenance-enabled explainable AI (PXAI). |
Jiachi Zhang; Wenchao Zhou; Benjamin E. Ujcich; |
| 44 | SPID-Join: A Skew-resistant Processing-in-DIMM Join Algorithm Exploiting The Bank- and Rank-level Parallelisms of DIMMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present SPID-Join, a skew-resistant PID join algorithm which exploits two parallelisms inherent in DIMM architectures, namely bank- and rank-level parallelisms. |
Suhyun Lee; Chaemin Lim; Jinwoo Choi; Heelim Choi; Chan Lee; Yongjun Park; Kwanghyun Park; Hanjun Kim; Youngsok Kim; |
| 45 | Towards A Converged Relational-Graph Optimization Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although SPJM queries can be converted to SPJ queries and optimized using existing relational query optimizers, our analysis shows that such a graph-agnostic method fails to benefit from graph-specific optimization techniques found in the literature. To address this issue, we develop a converged relational-graph optimization framework called RelGo for optimizing SPJM queries, leveraging joint efforts from both relational and graph query optimizations. |
Yunkai Lou; Longbin Lai; Bingqing Lyu; Yufan Yang; XiaoLi Zhou; Wenyuan Yu; Ying Zhang; Jingren Zhou; |
| 46 | Understanding and Reusing Test Suites Across Database Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a unified test suite, SQuaLity, in which we integrated test cases from three widely-used DBMSs, SQLite, PostgreSQL, and DuckDB. |
Suyang Zhong; Manuel Rigger; |
| 47 | Automating Vectorized Distributed Graph Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using 6 real-life graphs, we show that AutoMI-converted multi-instance algorithms are 9.6 to 29.5 times faster than serial evaluation, 7.1 to 26.4 times faster than batch evaluation, and are even 2.6 to 4.6 times faster than existing highly optimized handcrafted multi-instance algorithms without vectorization. |
Wenyue Zhao; Yang Cao; Peter Buneman; Jia Li; Nikos Ntarmos; |
| 48 | Λ-Tune: Harnessing Large Language Models for Automated Database System Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce λ-Tune, a framework that leverages Large Language Models (LLMs) for automated database system tuning. |
Victor Giannakouris; Immanuel Trummer; |
| 49 | Online Marketplace: A Benchmark for Data Management in Microservices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, it has been difficult to advance data system technologies that effectively support microservice applications. To fill this gap, we present Online Marketplace, a microservice benchmark that highlights core data management challenges that existing benchmarks fail to address. |
Rodrigo Laigner; Zhexiang Zhang; Yijian Liu; Leonardo Freitas Gomes; Yongluan Zhou; |
| 50 | A Local Search Approach to Efficient (k,p)-Core Maintenance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of maintaining all (k,p)-cores (essentially, maintaining the p-numbers for all vertices) for dynamic graphs. |
Chenghan Zhang; Yuanyuan Zhu; Lijun Chang; |
| 51 | A Rank-Based Approach to Recommender System’s Top-K Queries with Uncertain Scores Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address top-K queries based on uncertain scores. |
Coral Scharf; Carmel Domshlak; Avigdor Gal; Haggai Roitman; |
| 52 | Accelerating Core Decomposition in Billion-Scale Hypergraphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient approach for hypergraph k-core decomposition. |
Wenqian Zhang; Zhengyi Yang; Dong Wen; Wentao Li; Wenjie Zhang; Xuemin Lin; |
| 53 | Agree to Disagree: Robust Anomaly Detection with Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Sample selection often fails to separate sufficiently many clean anomaly samples from noisy ones, while label refurbishment erroneously refurbishes marginal clean samples. To overcome these limitations, we design Unity, the first learning from noisy labels (LNL) approach for anomaly detection that elegantly leverages the merits of both sample selection and label refurbishment to iteratively prepare a diverse clean sample set for network training. |
Dennis M. Hofmann; Peter M. VanNostrand; Lei Ma; Huayi Zhang; Joshua C. DeOliveira; Lei Cao; Elke A. Rundensteiner; |
| 54 | An Adaptive Benchmark for Modeling User Exploration of Large Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new DBMS performance benchmark that can simulate user exploration with any specified dashboard design made of standard visualization and interaction components. |
Joanna Purich; Anthony Wise; Leilani Battle; |
| 55 | An Elephant Under The Microscope: Analyzing The Interaction of Optimizer Components in PostgreSQL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, we argue that making improvements to a single optimization component requires a thorough understanding of how these changes might affect the other components. To achieve this understanding, we present results of a comprehensive experimental analysis of the interplay in the traditional optimizer architecture using the widely-used PostgreSQL system as prime representative. |
Rico Bergmann; Claudio Hartmann; Dirk Habich; Wolfgang Lehner; |
| 56 | An Experimental Comparison of Tree-data Structures for Connectivity Queries on Fully-dynamic Undirected Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: During the past decades significant efforts have been made to propose data structures for answering connectivity queries on fully dynamic graphs, i.e., graphs with frequent insertions and deletions of edges. |
Qing Chen; Michael H. B\{o}hlen; Sven Helmer; |
| 57 | AquaPipe: A Quality-Aware Pipeline for Knowledge Retrieval and Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents AquaPipe, which pipelines the execution of disk-based ANNS and the LLM prefill phase in an RAG system, effectively overlapping the latency of knowledge retrieval and model inference to enhance the overall performance, while guaranteeing data quality. |
Runjie Yu; Weizhou Huang; Shuhan Bai; Jian Zhou; Fei Wu; |
| 58 | Aster: Enhancing LSM-structures for Scalable Graph Database Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There is a proliferation of applications requiring the management of large-scale, evolving graphs under workloads with intensive graph updates and lookups. Driven by this challenge, we introduce Poly-LSM, a high-performance key-value storage engine for graphs with the following novel techniques: (1) Poly-LSM is embedded with a new design of graph-oriented LSM-tree structure that features a hybrid storage model for concisely and effectively storing graph data. |
Dingheng Mo; Junfeng Liu; Fan Wang; Siqiang Luo; |
| 59 | Automatic Database Configuration Debugging Using Retrieval-Augmented Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the configuration debugging process is tedious and, sometimes challenging, even for seasoned database administrators (DBAs) with sufficient experience in DBMS configurations and good understandings of the DBMS internals (e.g., MySQL or Oracle). To address this difficulty, we propose Andromeda, a framework that utilizes large language models (LLMs) to enable automatic DBMS configuration debugging. |
Sibei Chen; Ju Fan; Bin Wu; Nan Tang; Chao Deng; Pengyi Wang; Ye Li; Jian Tan; Feifei Li; Jingren Zhou; Xiaoyong Du; |
| 60 | B-Trees Are Back: Engineering Fast and Pageable Node Layouts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe an efficient B-Tree implementation supporting variable-sized records containing six known node layout optimizations. |
Marcus M\{u}ller; Lawrence Benson; Viktor Leis; |
| 61 | BⓈX: Subgraph Matching with Batch Backtracking Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For each search box, we introduce a refinement method to filter out unpromising candidate mappings. |
Yujie Lu; Zhijie Zhang; Weiguo Zheng; |
| 62 | BCviz: A Linear-Space Index for Mining and Visualizing Cohesive Bipartite Subgraphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on BCviz, we propose an exact maximum biclique search algorithm that searches for results on much smaller subgraphs than any existing method does.In addition, we improve the efficiency of index construction by two techniques. |
Jianxiong Ye; Zhaonian Zou; Dandan Liu; Bin Yang; Xudong Liu; |
| 63 | Boosting OLTP Performance with Per-Page Logging on NVDIMM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When running OLTP workloads on flash SSDs, relational DBMSs still face the write durability overhead, severely limiting their performance. To address this challenge, we propose NV-PPL, a novel database architecture that leverages NVDIMM as a durable log cache. |
Bohyun Lee; Seongjae Moon; Jonghyeok Park; Sang-Won Lee; |
| 64 | Bursting Flow Query on Large Temporal Flow Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a novel query of finding a flow pattern of burstiness in a temporal flow network. |
Lyu Xu; Jiaxin Jiang; Byron Choi; Jianliang Xu; Bingsheng He; |
| 65 | Capsule: An Out-of-Core Training Mechanism for Colossal GNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, this work introduces Capsule, a new out-of-core mechanism for large-scale GNN training. |
Yongan Xiang; Zezhong Ding; Rui Guo; Shangyou Wang; Xike Xie; S. Kevin Zhou; |
| 66 | Cardinality Estimation of LIKE Predicate Queries Using Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To provide more accurate cardinality estimates and reduce the maximum estimation errors, we propose a deep learning model that utilizes the extended N-gram table and the conditional regression header. |
Suyong Kwon; Kyuseok Shim; Woohwan Jung; |
| 67 | Centrum: Model-based Database Auto-tuning with Minimal Distributional Assumptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, recent advances in (ensemble) gradient boosting, which can further enhance surrogate modeling against vanilla GP and random forest counterparts, have rarely been applied in optimizing DBMS auto-tuners. To address these issues, we propose a novel model-based DBMS auto-tuner, Centrum. |
Yuanhao Lai; Pengfei Zheng; Chenpeng Ji; Yan Li; Songhan Zhang; Rutao Zhang; Zhengang Wang; Yunfei Du; |
| 68 | Cohesiveness-aware Hierarchical Compressed Index for Community Search on Attributed Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, pruning strategies are typically tailored to specific algorithms and their cohesiveness metrics, making them difficult to generalize. To address this, we study a general approach to accelerate various CSAG methods. |
Yuxiang Wang; Zhangyang Peng; Xiangyu Ke; Xiaoliang Xu; Tianxing Wu; Yuan Gao; |
| 69 | Computing Approximate Graph Edit Distance Via Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an ensemble approach that integrates a supervised learning-based method and an unsupervised method, both based on optimal transport. |
Qihao Cheng; Da Yan; Tianhao Wu; Zhongyi Huang; Qin Zhang; |
| 70 | Constant Optimization Driven Database System Testing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Constant-Optimization-Driven Database Testing (CODDTest) as a novel approach for detecting logic bugs in DBMSs. |
Chi Zhang; Manuel Rigger; |
| 71 | CRDV: Conflict-free Replicated Data Views Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: There are now multiple proposals for Conflict-free Replicated Data Types (CRDTs) in SQL databases aimed at distributed systems. Some, such as ElectricSQL, provide only relational … |
Nuno Faria; Jos\'{e} Pereira; |
| 72 | Data Chunk Compaction in Vectorized Execution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To answer the ”how” question, we propose a compaction method for the hash join operator, called logical compaction, that minimizes data movements when compacting data chunks. |
Yiming Qiao; Huanchen Zhang; |
| 73 | DataVinci: Learning Syntactic and Semantic String Repairs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DataVinci, a fully unsupervised string data error detection and repair system. |
Mukul Singh; Jos\'{e} Cambronero; Sumit Gulwani; Vu Le; Carina Negreanu; Arjun Radhakrishna; Gust Verbruggen; |
| 74 | Deep Overlapping Community Search Via Subspace Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formally redefine the problem of OCS. |
Qing Sima; Jianke Yu; Xiaoyang Wang; Wenjie Zhang; Ying Zhang; Xuemin Lin; |
| 75 | DEG: Efficient Hybrid Vector Search Using The Dynamic Edge Navigation Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods for HVQ typically construct Approximate Nearest Neighbors Search (ANNS) indexes with a fixed α value. This leads to significant performance degradation when the query’s α dynamically changes based on the different scenarios and needs.In this study, we introduce the Dynamic Edge Navigation Graph ( DEG ), a graph-based ANNS index that maintains efficiency and accuracy with changing α values. |
Ziqi Yin; Jianyang Gao; Pasquale Balsebre; Gao Cong; Cheng Long; |
| 76 | Density Decomposition of Bipartite Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing dense subgraph models, such as biclique, k-biplex, k-bitruss, and (α,β)-core, often face challenges due to their high computational complexity or limitations in effectively capturing the density of the graph. To overcome these issues, in this paper, we propose a new dense subgraph model for bipartite graphs, namely (α,β)-dense subgraph, designed to capture the density structure inherent in bipartite graphs. |
Yalong Zhang; Rong-Hua Li; Qi Zhang; Hongchao Qin; Lu Qin; Guoren Wang; |
| 77 | Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Chatty-Gen, a novel multi-stage retrieval-augmented generation platform for automatically generating high-quality dialogue benchmarks tailored to a specific domain using a KG. |
Reham Omar; Omij Mangukiya; Essam Mansour; |
| 78 | DISCES: Systematic Discovery of Event Stream Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, given a database of finite, historic (sub-)streams that have been gathered whenever a situation of interest was observed, one may aim at automatic discovery of the respective queries. Existing algorithms for event query discovery incorporate ad-hoc design choices, though, and it is unclear how their suitability for a database shall be assessed.In this paper, we address this gap with DISCES, an algorithmic framework for event query discovery. |
Rebecca Sattler; Sarah Kleest-Mei\ss{}ner; Steven Lange; Markus L. Schmid; Nicole Schweikardt; Matthias Weidlich; |
| 79 | Disco: A Compact Index for LSM-trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Disco: a compact index for LSM-trees. |
Wenshao Zhong; Chen Chen; Xingbo Wu; Jakob Eriksson; |
| 80 | DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these systems suffer from either read amplification when conducting random reads for node features that are smaller than a disk page, or degraded model accuracy by treating the graph as disconnected partitions. To close this gap, we build DiskGNN for high I/O efficiency and fast training without model accuracy degradation. |
Renjie Liu; Yichuan Wang; Xiao Yan; Haitian Jiang; Zhenkun Cai; Minjie Wang; Bo Tang; Jinyang Li; |
| 81 | Dual-Hierarchy Labelling: Scaling Up Distance Queries on Dynamic Road Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an efficient solution Dual-Hierarchy Labelling (DHL) for distance querying on dynamic road networks from a novel perspective, which incorporates two hierarchies with different but complementary data structures to support efficient query and update processing. |
Muhammad Farhan; Henning Koehler; Qing Wang; |
| 82 | Efficient Index Maintenance for Effective Resistance Computation on Evolving Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a problem of index maintenance on evolving graphs for effective resistance computation. |
Meihao Liao; Cheng Li; Rong-Hua Li; Guoren Wang; |
| 83 | Efficient Maximum S-Bundle Search Via Local Vertex Connectivity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new branch-and-bound algorithm, called SymBD, which achieves improved theoretical guarantees and practical performance. |
Yang Liu; Hejiao Huang; Kaiqiang Yu; Shengxin Liu; Cheng Long; |
| 84 | Efficiently Counting Triangles in Large Temporal Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study fast algorithms for counting δ-temporal triangles in a given query time window. |
Yuyang Xia; Yixiang Fang; Wensheng Luo; |
| 85 | Efficiently Processing Joins and Grouped Aggregations on GPUs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce GFTR, a novel technique to reduce random accesses, leading to speedups of up to 2.3x. |
Bowen Wu; Dimitrios Koutsoukos; Gustavo Alonso; |
| 86 | Entity/Relationship Graphs: Principled Design, Modeling, and Data Integrity Management of Graph Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We define E/R graphs as property graphs that are instances of E/R diagrams. |
Philipp Skavantzos; Sebastian Link; |
| 87 | FastPDB: Towards Bag-Probabilistic Queries at Interactive Speeds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study computing expected multiplicities of query results over probabilistic databases under bag semantics which has PTIME data complexity. |
Aaron Huber; Oliver Kennedy; Atri Rudra; Zhuoyue Zhao; Su Feng; Boris Glavic; |
| 88 | Federated Heavy Hitter Analytics with Local Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in federated settings, applying LDP complicates the other two challenges, due to the deteriorated utility by the injected LDP noise or increasing communication/computation costs by perturbation mechanism. To tackle these problems, we propose a novel target-aligning prefix tree mechanism satisfying ε-LDP, for federated heavy hitter analytics. |
Yuemin Zhang; Qingqing Ye; Haibo Hu; |
| 89 | Graph-Based Vector Search: An Experimental Evaluation of The State-of-the-Art Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Vector search is the backbone of many critical analytical tasks, and graph-based methods have become the best choice for analytical tasks that do not require guarantees on the quality of the answers. We briefly survey in-memory graph-based vector search, outline the chronology of the different methods and classify them according to five main design paradigms: seed selection, incremental insertion, neighborhood propagation, neighborhood diversification, and divide-and-conquer. |
Ilias Azizi; Karima Echihabi; Themis Palpanas; |
| 90 | H-Rocks: CPU-GPU Accelerated Heterogeneous RocksDB on Persistent Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Persistent key-value stores (pKVS) such as RocksDB are critical to many internet-scale services. Recent works leveraged persistent memory (PM) to improve pKVS throughput. However, … |
Shweta Pandey; Arkaprava Basu; |
| 91 | HyperMR: Efficient Hypergraph-enhanced Matrix Storage on Compute-in-Memory Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current storage schemes are still inefficient on CIM due to limited optimization objectives and inflexible support for various access patterns and matrix structures. To address this, we propose HyperMR, a hypergraph-enhanced matrix storage scheme for CIM architectures. |
Yifan Wu; Ke Chen; Gang Chen; Dawei Jiang; Huan Li; Lidan Shou; |
| 92 | In-Database Time Series Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an in-database adaptation of SOTA time series clustering method K-Shape. |
Yunxiang Su; Kenny Ye Liang; Shaoxu Song; |
| 93 | InTime: Towards Performance Predictability In Byzantine Fault Tolerant Proof-of-Stake Consensus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make ARI robust against malicious behaviors, we establish a committee time witness (CTW) workflow to accurately gather and verify transaction arrival times. |
Weijie Sun; Zihuan Xu; Wangze Ni; Lei Chen; |
| 94 | ISSD: Indicator Selection for Time Series State Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose ISSD (Indicator Selection for State Detection), an indicator selection method for time series state detection. |
Chengyu Wang; Tongqing Zhou; Lin Chen; Shan Zhao; Zhiping Cai; |
| 95 | Largest Triangle Sampling for Visualizing Time Series in Database Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the shortcomings by contributing a novel Iterative Largest Triangle Sampling (ILTS) algorithm with convex hull acceleration. |
Lei Rui; Xiangdong Huang; Shaoxu Song; Chen Wang; Jianmin Wang; Zhao Cao; |
| 96 | LCP: Enhancing Scientific Data Management with Lossy Compression for Particles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce LCP, an innovative lossy compressor designed for particle datasets, offering superior compression quality and higher speed than existing compression solutions. |
Longtao Zhang; Ruoyu Li; Congrong Ren; Sheng Di; Jinyang Liu; Jiajun Huang; Robert Underwood; Pascal Grosset; Dingwen Tao; Xin Liang; Hanqi Guo; Franck Cappello; Kai Zhao; |
| 97 | LeaFi: Data Series Indexes on Steroids with Learned Filters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we observe a significant waste of effort during search, due to suboptimal pruning. To address this issue, we introduce LeaFi, a novel framework that uses machine learning models to boost pruning effectiveness of tree-based data series indexes. |
Qitong Wang; Ioana Ileana; Themis Palpanas; |
| 98 | MAST: Towards Efficient Analytical Query Processing on Point Cloud Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, there is a notable gap in research regarding the efficiency of invoking deep models for each PC data query, especially when dealing with large-scale models and datasets. To address this issue, this work aims to design an efficient approximate approach for supporting PC analysis queries, including PC retrieval and aggregate queries. |
Jiangneng Li; Haitao Yuan; Gao Cong; Han Mao Kiah; Shuhao Zhang; |
| 99 | MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MEMO, a novel LLM training framework designed for fine-grained activation memory management. |
Pinxue Zhao; Hailin Zhang; Fangcheng Fu; Xiaonan Nie; Qibin Liu; Fang Yang; Yuanbo Peng; Dian Jiao; Shuaipeng Li; Jinbao Xue; Yangyu Tao; Bin Cui; |
| 100 | Minimum Spanning Tree Maintenance in Dynamic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel algorithm to maintain MST in dynamic graphs, which achieves high practical efficiency. |
Lantian Xu; Dong Wen; Lu Qin; Ronghua Li; Ying Zhang; Yang Lu; Xuemin Lin; |
| 101 | Modyn: Data-Centric Machine Learning Pipeline Orchestration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Modyn, a data-centric end-to-end machine learning platform. |
Maximilian B\{o}ther; Ties Robroek; Viktor Gsteiger; Robin Holzinger; Xianzhe Ma; P\i{}nar T\{o}z\{u}n; Ana Klimovic; |
| 102 | Multi-Level Graph Representation Learning Through Predictive Community-based Partitioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a novel GRL model, Multi-Level GRL (simply, ML-GRL), that recursively partitions input graphs by selecting the most appropriate community detection algorithm at each graph or partitioned subgraph. |
Bo-Young Lim; Jeong-Ha Park; Kisung Lee; Hyuk-Yoon Kwon; |
| 103 | Nezha: An Efficient Distributed Graph Processing System on Heterogeneous Hardware Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the widespread use of the Scatter-Gather model for large-scale graph processing across distributed machines, the performance still can be significantly improved as the computation ability of each machine is not fully utilized and the communication costs during graph processing are expensive in the distributed environment. In this work, we propose a novel and efficient distributed graph processing system Nezha on heterogeneous hardware, where each machine is equipped with both CPU and GPU processors and all these machines in the distributed cluster are interconnected via Remote Direct Memory Access (RDMA). |
Pengjie Cui; Haotian Liu; Dong Jiang; Bo Tang; Ye Yuan; |
| 104 | OBIR-tree: An Efficient Oblivious Index for Spatial Keyword Queries on Secure Enclaves Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing schemes struggle to enable top-k spatial keyword queries on encrypted data while hiding search, access, and volume patterns, which raises concerns about availability and security. To address the above issue, this paper proposes OBIR-tree, a novel index structure for oblivious (provably hides search, access, and volume patterns) top-k spatial keyword queries on encrypted data. |
Zikai Ye; Xiangyu Wang; Zesen Liu; Dan Zhu; Jianfeng Ma; |
| 105 | On Graph Representation for Attributed Hypergraph Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Attributed Hypergraph Representation for Clustering (AHRC), a cluster-number-free hypergraph clustering consisting of an effective integration of the hypergraph topology and node attributes for hypergraph representation, a multi-hop modularity function for optimization, and a hypergraph sparsification for scalable computation. |
Zijin Feng; Miao Qiao; Chengzhi Piao; Hong Cheng; |
| 106 | Optimizing Block Skipping for High-Dimensional Data with Learned Adaptive Curve Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose AdaCurve, a novel approach aimed at enhancing block skipping in high-dimensional datasets through adaptive optimization of data layout. |
Xu Chen; Shuncheng Liu; Tong Yuan; Tao Ye; Kai Zeng; Han Su; Kai Zheng; |
| 107 | Pandora: An Efficient and Rapid Solution for Persistence-Based Tasks in High-Speed Data Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods often struggle with accuracy, especially given highly skewed data distributions and tight fastest memory budgets, where hash collisions are severe. In this paper, we introduce Pandora, a novel approximate data structure designed to tackle these challenges efficiently. |
Weihe Li; |
| 108 | Parallel Kd-tree with Batch Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Pkd-tree (Parallel kd-tree), a parallel kd-tree that is efficient both in theory and in practice. |
Ziyang Men; Zheqi Shen; Yan Gu; Yihan Sun; |
| 109 | PoneglyphDB: Efficient Non-interactive Zero-Knowledge Proofs for Arbitrary SQL-Query Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces PoneglyphDB, a database system that leverages non-interactive zero-knowledge proofs (ZKP) to support both confidentiality and provability. |
Binbin Gu; Juncheng Fang; Faisal Nawab; |
| 110 | Practical DB-OS Co-Design with Privileged Kernel Bypass Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches to this DB-OS co-design struggle with limited design space, security risks, and compatibility issues. To overcome these hurdles, we propose a new co-design approach leveraging virtualization to elevate the privilege level of DB processes. |
Xinjing Zhou; Viktor Leis; Jinming Hu; Xiangyao Yu; Michael Stonebraker; |
| 111 | Progressive Entity Matching: A Design Space Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel framework for Progressive Entity Matching that organizes relevant techniques into four consecutive steps: (i) filtering, which reduces the search space to the most likely candidate matches, (ii) weighting, which associates every pair of candidate matches with a similarity score, (iii) scheduling, which prioritizes the execution of the candidate matches so that the real duplicates precede the non-matching pairs, and (iv) matching, which applies a complex, matching function to the pairs in the order defined by the previous step. |
Jakub Maciejewski; Konstantinos Nikoletos; George Papadakis; Yannis Velegrakis; |
| 112 | QURE: AI-Assisted and Automatically Verified UDF Inlining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limits coverage and makes the translation approaches less extensible to previously unseen procedural constructs. In this work, we present QURE, a framework that (1) leverages large language models (LLMs) to translate UDFs to native SQL, and (2) introduces a novel formal verification method to establish equivalence between the UDF and its translation. |
Tarique Siddiqui; Arnd Christian K\{o}nig; Jiashen Cao; Cong Yan; Shuvendu K. Lahiri; |
| 113 | Randomized Sketches for Quantile in LSM-tree Based Store Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose pre-computing randomized sketches which provide randomized additive error guarantees. |
Ziling Chen; Shaoxu Song; |
| 114 | Rapid Data Ingestion Through DB-OS Co-design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, a new design for data access control is necessary to enhance rapid data ingestion in databases. To address this concern, we propose a novel DB-OS co-design that efficiently supports sequential data access at full device speed. |
Kyungmin Lim; Minseok Yoon; Kihwan Kim; Alan David Fekete; Hyungsoo Jung; |
| 115 | Reliable Text-to-SQL with Adaptive Abstention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the BIRD benchmark, our approach achieves near-perfect schema linking accuracy, autonomously involving a human when needed. |
Kaiwen Chen; Yueting Chen; Nick Koudas; Xiaohui Yu; |
| 116 | Revisiting The Design of In-Memory Dynamic Graph Storage Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there has been no systematic study to explore the trade-offs among these dimensions. In this paper, we evaluate the effectiveness of individual techniques and identify the performance factors affecting these storage methods by proposing a common abstraction for DGS design and implementing a generic test framework based on this abstraction. |
Jixian Su; Chiyu Hao; Shixuan Sun; Hao Zhang; Sen Gao; Jiaxin Jiang; Yao Chen; Chenyi Zhang; Bingsheng He; Minyi Guo; |
| 117 | RLER-TTE: An Efficient and Effective Framework for En Route Travel Time Estimation with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel framework that redefines the implementation path of ER-TTE to achieve highly efficient and effective predictions. |
Zhihan Zheng; Haitao Yuan; Minxiao Chen; Shangguang Wang; |
| 118 | Schema-Based Query Optimisation for Graph Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a type inference mechanism that enriches recursive graph queries with relevant structural information contained in a graph schema. |
Chandan Sharma; Pierre Genev\`{e}s; Nils Gesbert; Nabil Laya\{\i}da; |
| 119 | SecureXGB: A Secure and Efficient Multi-party Protocol for Vertical Federated XGBoost Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a secure and efficient multi-party protocol for vertical federated XGBoost, called SecureXGB, which can perform the collaborative training of an XGBoost model in an SS-friendly manner. |
Zongda Han; Xiang Cheng; Wenhong Zhao; Jiaxin Fu; Zhaofeng He; Sen Su; |
| 120 | Sequoia: An Accessible and Extensible Framework for Privacy-Preserving Machine Learning Over Distributed Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Privacy-preserving machine learning (PPML) algorithms use secure computation protocols to allow multiple data parties to collaboratively train machine learning (ML) models while maintaining their data confidentiality. |
Kaiqiang Xu; Di Chai; Junxue Zhang; Fan Lai; Kai Chen; |
| 121 | Shapley Value Estimation Based on Differential Matrix Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The existing methods estimate the Shapley values directly. In this paper, we explore a novel idea-inferring the Shapley values by estimating the differences between them. |
Junyuan Pang; Jian Pei; Haocheng Xia; Xiang Li; Jinfei Liu; |
| 122 | SHARQ: Explainability Framework for Association Rules on Relational Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, we present an efficient framework for computing the exact SHARQ value of a single element whose running time is practically linear in the number of rules. |
Hadar Ben-Efraim; Susan B. Davidson; Amit Somech; |
| 123 | SNAILS: Schema Naming Assessments for Improved LLM-Based SQL Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Naturally, a lot of work in the ML+DB intersection aims to mitigate such LLM limitations. In this work, we shine the light on a complementary data-centric question: How should DB schemas evolve in this era of LLMs to boost NL-to-SQL? |
Kyle Luoma; Arun Kumar; |
| 124 | SPAS: Continuous Release of Data Streams Under W-Event Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, a recent benchmark reveals that none of the existing works offer a universally effective solution across all types of data streams, making it challenging to select an appropriate scheme for unknown data streams in practical scenarios.We identify that all existing methods are heuristic-based and make data-independent decisions. In this paper, we change this landscape by introducing SPAS which is built on data-dependent strategies. |
Xiaochen Li; Tianyu Li; Yitian Cheng; Chen Gong; Kui Ren; Zhan Qin; Tianhao Wang; |
| 125 | Subspace Collision: An Efficient and Accurate Framework for High-dimensional Approximate Nearest Neighbor Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first design SC-score, a metric that we show follows the Pareto principle and can act as a proxy for the Euclidean distance between data points. Inspired by this, we propose a novel ANN search framework called Subspace Collision (SC), which can provide theoretical guarantees on the quality of its results. |
Jiuqi Wei; Xiaodong Lee; Zhenyu Liao; Themis Palpanas; Botao Peng; |
| 126 | SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, following NGT-QG, we present a new method named SymphonyQG, which achieves more symphonious integration of quantization and graph (e.g., it avoids the explicit re-ranking step and refines the graph structure to be more aligned with FastScan). |
Yutong Gou; Jianyang Gao; Yuexuan Xu; Cheng Long; |
| 127 | TGraph: A Tensor-centric Graph Processing Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first tensor-based graph processing framework, Tgraph, which can be smoothly deployed and run on any powerful hardware accelerators (uniformly called XPU) that support Tensor Computation Runtimes (TCRs). |
Yongliang Zhang; Yuanyuan Zhu; Hao Zhang; Congli Gao; Yuyang Wang; Guojing Li; Tianyang Xu; Ming Zhong; Jiawei Jiang; Tieyun Qian; Chenyi Zhang; Jeffrey Xu Yu; |
| 128 | Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression Using Triangle Inequalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an efficient vector data query engine to enhance the granularity of cluster-based index by carefully subdividing clusters using diverse distance metrics. |
Qian Xu; Juan Yang; Feng Zhang; Junda Pan; Kang Chen; Youren Shen; Amelie Chi Zhou; Xiaoyong Du; |
| 129 | U-DPAP: Utility-aware Efficient Range Counting on Privacy-preserving Spatial Data Federation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing data federation schemes employ Secure Multiparty Computation (SMC) to protect privacy, but this approach is computationally expensive and leads to high latency. Consequently, private data federations are often impractical for typical database workloads.This challenge highlights the need for a private data federation scheme capable of providing fast and accurate query responses while maintaining strong privacy.To address this issue, we propose U-DPAP, a utility-aware efficient privacy-preserving method. |
Yahong Chen; Xiaoyi Pang; Xiaoguang Li; Hanyi Wang; Ben Niu; Shengnan Hu; |
| 130 | Ultraverse: An Efficient What-if Analysis Framework for Software Applications Interacting with Database Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This isolated approach limits their effectiveness in scenarios where intensive interaction between applications and database systems occurs. To address this gap, we introduce Ultraverse, a what-if analysis framework that seamlessly integrates both application and database layers. |
Ronny Ko; Chuan Xiao; Makoto Onizuka; Zhiqiang Lin; Yihe Huang; |
| 131 | User-Centric Property Graph Repairs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an interactive and user-centric approach to repair property graphs under denial constraints. |
Amedeo Pachera; Angela Bonifati; Andrea Mauri; |
| 132 | VEGA: An Active-tuning Learned Index with Group-Wise Learning Granularity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This gives rise to an interesting open question: whether there exists a learned index that simultaneously achieves state-of-the-art empirical performance and matching complexity? In this paper, we give a positive answer to this standing problem.We propose two new online model-building policies: (1) simplifying distribution by the adoption of a proper granularity (i.e., grouping multiple keys together for model-building) and (2) actively tuning distribution through key repositioning. |
Meng Li; Huayi Chai; Siqiang Luo; Haipeng Dai; Rong Gu; Jiaqi Zheng; Guihai Chen; |
| 133 | A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While large language models (LLMs) can directly label advertisements, doing so at scale is prohibitively expensive. We propose a cost-effective strategy that leverages LLMs to generate pseudo labels for a small sample of the data and uses these labels to create specialized classification models. |
Juliana Silva Barbosa; Ulhas Gondhali; Gohar Petrossian; Kinshuk Sharma; Sunandan Chakraborty; Jennifer Jacquet; Juliana Freire; |
| 134 | A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning Enhanced Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces LITune, a novel framework for end-to-end automatic tuning of Learned Index Structures. |
Taiyi Wang; Liang Liang; Guang Yang; Thomas Heinis; Eiko Yoneki; |
| 135 | A Structured Study of Multivariate Time-Series Distance Measures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the existing experimental studies on multivariate distances have critical limitations: (a) focusing only on lock-step and elastic measures while ignoring categories such as sliding and kernel measures; (b) considering only one normalization technique; and (c) placing limited focus on statistical analysis of findings. Motivated by these shortcomings, we present the most complete evaluation of multivariate distance measures to date. |
Jens E. d’Hondt; Haojun Li; Fan Yang; Odysseas Papapetrou; John Paparrizos; |
| 136 | Accelerate Distributed Joins with Predicate Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to address both limitations. |
Yifei Yang; Xiangyao Yu; |
| 137 | Accelerating Graph Indexing for ANNS on Modern CPUs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Drawing from insights gained through integrating existing compact coding methods in the graph indexing process, we propose a novel compact coding strategy, named Flash, designed explicitly for graph indexing and optimized for modern CPU architectures. |
Mengzhao Wang; Haotian Wu; Xiangyu Ke; Yunjun Gao; Yifan Zhu; Wenchao Zhou; |
| 138 | Accelerating Skyline Path Enumeration with A Core Attribute Index on Multi-attribute Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in this paper, we study the problem of skyline path enumeration, which aims to identify paths that balance multiple attributes, ensuring that no skyline result is dominated by another, thus meeting diverse user needs. |
Yuanyuan Zeng; Yixiang Fang; Wensheng Luo; Chenhao Ma; |
| 139 | Adda: Towards Efficient In-Database Feature Generation Via LLM-based Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Adda, an agent-driven in-database feature generation tool designed to automatically create high-quality features for ML analytics directly within the database. |
Kuan Lu; Zhihui Yang; Sai Wu; Ruichen Xia; Dongxiang Zhang; Gang Chen; |
| 140 | AJOSC: Adaptive Join Order Selection for Continuous Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new A daptive J oin O rder S election algorithm for the C ontinuous multi-way join queries named AJOSC. |
Xinyi Ye; Xiangyang Gou; Lei Zou; Wenjie Zhang; |
| 141 | Alsatian: Optimizing Model Search for Deep Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the observation that many candidate models overlap to a significant extent and following a careful bottleneck analysis, we propose optimization techniques that are applicable to many model search frameworks. |
Nils Strassenburg; Boris Glavic; Tilmann Rabl; |
| 142 | Approximate DBSCAN Under Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we show that both empirically and theoretically, this approach cannot offer any utility in the published results. We therefore propose an alternative definition of DP-DBSCAN based on the notion of spans. |
Yuan Qiu; Ke Yi; |
| 143 | Approximating Opaque Top-k Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose an approximation algorithm for opaque top-k query answering. |
Jiwon Chang; Fatemeh Nargesian; |
| 144 | Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Apt-Serve, a scalable framework designed to enhance effective throughput in LLM inference serving. |
Shihong Gao; Xin Zhang; Yanyan Shen; Lei Chen; |
| 145 | Are Database System Researchers Making Correct Assumptions About Transaction Workloads? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the impact of this progress is directly dependent on the accuracy of these assumptions both for current and future applications. In this paper, we conduct an extensive study of 111 open-source applications, analyzing over 30,000 transactions to evaluate the accuracy of these assumptions both as they exist in the current codebase, and how extensive are the changes required to the code for these assumptions to hold moving forward. |
Cuong D. T. Nguyen; Kevin Chen; Christopher DeCarolis; Daniel J. Abadi; |
| 146 | Athena: An Effective Learning-based Framework for Query Optimizer Performance Improvement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Athena, an effective learning-based framework of query optimizer enhancer. |
Runzhong Li; Qilong Li; Haotian Liu; Rui Mao; Qing Li; Bo Tang; |
| 147 | Auto-Test: Learning Semantic-Domain Constraints for Unsupervised Error Detection in Tables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While powerful logic and statistical algorithms have been developed to detect and repair data errors in tables, existing algorithms predominantly rely on domain-experts to first manually specify data-quality constraints specific to a given table, before data cleaning algorithms can be applied.In this work, we observe that there is an important class of data-quality constraints that we call Semantic-Domain Constraints, which can be reliably inferred and automatically applied to any tables, without requiring domain-experts to manually specify on a per-table basis. We develop a principled framework to systematically learn such constraints from table corpora using large-scale statistical tests, which can further be distilled into a core set of constraints using our optimization framework, with provable quality guarantees. |
Qixu Chen; Yeye He; Raymond Chi-Wing Wong; Weiwei Cui; Song Ge; Haidong Zhang; Dongmei Zhang; Surajit Chaudhuri; |
| 148 | Automated Validating and Fixing of Text-to-SQL Translation with Execution Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To do so, we propose a necessary correctness condition called execution consistency. |
Yicun Yang; Zhaoguo Wang; Yu Xia; Zhuoran Wei; Haoran Ding; Ruzica Piskac; Haibo Chen; Jinyang Li; |
| 149 | BPF-DB: A Kernel-Embedded Transactional Database Management System For EBPF Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by embedded DBMSs for user-space applications, this paper present BPF-DB, an OS-embedded DBMS that offers transactional data management for eBPF applications. |
Matthew Butrovich; Samuel Arch; Wan Shen Lim; William Zhang; Jignesh M Patel; Andrew Pavlo; |
| 150 | Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Cache-Craft, a system for managing and reusing precomputed KVs corresponding to the text chunks (which we call chunk-caches) in RAG-based systems. |
Shubham Agarwal; Sai Sundaresan; Subrata Mitra; Debabrata Mahapatra; Archit Gupta; Rounak Sharma; Nirmal Joshua Kapu; Tong Yu; Shiv Saini; |
| 151 | CARINA: An Efficient CXL-Oriented Embedding Serving System for Recommendation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The non-uniform memory access (NUMA) architecture in modern CXL servers further decreased the system performance. In this paper, we design Carina for ERM serving on heterogeneous memory with CXL by considering such bandwidth asymmetry. |
Peiqi Yin; Qihui Zhou; Xiao Yan; Chao Wang; Eric Lo; Changji Li; Lan Lu; Hua Fan; Wenchao Zhou; Ming-Chang Yang; James Cheng; |
| 152 | Clementi: Efficient Load Balancing and Communication Overlap for Multi-FPGA Graph Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These inefficiencies hinder scalability and performance, highlighting a critical research gap. To address these issues, we introduce Clementi, an efficient multi-FPGA graph processing framework that features customized fine-grained pipelines for computation and cross-FPGA communication. |
Feng Yu; Hongshi Tan; Xinyu Chen; Yao Chen; Bingsheng He; Weng-Fai Wong; |
| 153 | Community Detection in Heterogeneous Information Networks Without Materialization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While traditional algorithms may perform adequately in some scenarios, many struggle with the high memory usage and computational demands of large-scale HINs. To address these challenges, we introduce a novel framework, SCAR, which efficiently uncovers community structures in HINs without requiring network materialization. |
Jiaxin Jiang; Siyuan Yao; Yuhang Chen; Bingsheng He; Yudong Niu; Yuchen Li; Shixuan Sun; Yongchao Liu; |
| 154 | Computing Inconsistency Measures Under Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, given a collection of integrity constraints, various ways have been proposed to quantify the inconsistency of a database. |
Shubhankar Mohapatra; Amir Gilad; Xi He; Benny Kimelfeld; |
| 155 | Cracking SQL Barriers: An LLM-based Dialect Translation System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the problem of automating dialect translation with large language models (LLMs). |
Wei Zhou; Yuyang Gao; Xuanhe Zhou; Guoliang Li; |
| 156 | Credible Intervals for Knowledge Graph Accuracy Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to overcome the limitations of CIs by using Credible Intervals (CrIs), which are grounded in Bayesian statistics. |
Stefano Marchesin; Gianmaria Silvello; |
| 157 | CuMatch: A GPU-based Memory-Efficient Worst-case Optimal Join Processing Method for Subgraph Queries with Complex Patterns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose cuMatch, a GPU-based unified worst-case optimal join processing method for subgraph queries. |
Sungwoo Park; Seyeon Oh; Min-Soo Kim; |
| 158 | Dangers of List Processing in Querying Property Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To increase expressiveness of post-processing of pattern matching results, languages such as Cypher introduce the capability of creating lists of nodes and edges from matched paths, and provide users with standard list processing tools such as reduce. We show that on the one hand, this makes it possible to capture useful classes of queries that pattern matching alone cannot do. |
Am\'{e}lie Gheerbrant; Leonid Libkin; Alexandra Rogova; |
| 159 | Data Enhancement for Binary Classification of Relational Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate two data enhancing problems accordingly, and show that both problems are intractable.Despite the hardness, we propose a framework that integrates model training and data enhancing. |
Wenfei Fan; Xiaoyu Han; Weilong Ren; Zihuan Xu; |
| 160 | Debunking The Myth of Join Ordering: Toward Robust SQL Analytics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we rediscover the recent Predicate Transfer technique from a robustness point of view. |
Junyi Zhao; Kai Su; Yifei Yang; Xiangyao Yu; Paraschos Koutris; Huanchen Zhang; |
| 161 | DFlush: DPU-Offloaded Flush for Disaggregated LSM-based Key-Value Stores Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While extensive research has been conducted to reduce the CPU overhead of background compaction, less attention has been paid to background flushing, which can also consume a significant amount of valuable CPU cycles and disrupt CPU caches, ultimately impacting overall performance. In this paper, we propose DFlush, a novel solution that uses DPUs to offload background flush operations to reduce its CPU cost. |
Chen Ding; Kai Lu; Quanyi Zhang; Zekun Ye; Ting Yao; Daohui Wang; Huatao Wu; Jiguang Wan; |
| 162 | DIGRA: A Dynamic Graph Indexing for Approximate Nearest Neighbor Search with Range Filter Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our approach introduces a dynamic multi-way tree structure combined with carefully integrated ANNS indices to handle range filtered ANNS efficiently. |
Mengxu Jiang; Zhi Yang; Fangyuan Zhang; Guanhao Hou; Jieming Shi; Wenchao Zhou; Feifei Li; Sibo Wang; |
| 163 | Divide-and-Conquer: Scalable Shortest Path Counting on Large Road Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods, such as 2-hop labeling schemes, precompute shortest-path distances and counts for efficient queries but struggle to scale in large networks. In this work, we propose a novel divide-and-conquer approach based on recursive vertex bipartitioning to address this limitation. |
Muhammad Farhan; Henning Koehler; Qing Wang; |
| 164 | Dupin: A Parallel Framework for Densest Subgraph Discovery in Fraud Detection on Massive Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, deploying DSD methods in production systems faces substantial scalability challenges due to the predominantly sequential nature of existing methods, which impedes their ability to handle large-scale transaction networks and results in significant detection delays. To address these challenges, we introduce Dupin, a novel parallel processing framework designed for efficient DSD processing in billion-scale graphs. |
Jiaxin Jiang; Siyuan Yao; Yuchen Li; Qiange Wang; Bingsheng He; Min Chen; |
| 165 | Efficient and Accurate Differentially Private Cardinality Continual Releases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel cardinality estimation framework, FC, which ensures differential privacy under continual releases while simultaneously achieving low memory usage, high accuracy, and efficient computation. |
Dongdong Xie; Pinghui Wang; Quanqing Xu; Chuanhui Yang; Rundong Li; |
| 166 | Efficient Dynamic Indexing for Range Filtered Approximate Nearest Neighbor Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing solutions for range filtered ANN search often face trade-offs among excessive storage, poor query performance, and limited support for updates. To address this challenge, we propose RangePQ, a novel indexing scheme that supports efficient range filtered ANN searches and updates, requiring only linear space. |
Fangyuan Zhang; Mengxu Jiang; Guanhao Hou; Jieming Shi; Hua Fan; Wenchao Zhou; Feifei Li; Sibo Wang; |
| 167 | Efficient Indexing for Flexible Label-Constrained Shortest Path Queries in Road Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient index-based solution called Border-based State Move (BSM), which can answer LCSP queries quickly with flexible use of the language constraint. |
Libin Wang; Raymond Chi-Wing Wong; |
| 168 | Extending SQL to Return A Subdatabase Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue for eliminating the single-table limitation of SQL. |
Joris Nix; Jens Dittrich; |
| 169 | FAAQP: Fast and Accurate Approximate Query Processing Based on Bitmap-augmented Sum-Product Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a fast and accurate AQP method FAAQP. |
Hanbing Zhang; Yinan Jing; Zhenying He; Kai Zhang; X. Sean Wang; |
| 170 | Fair and Actionable Causal Prescription Ruleset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, in decision making for tasks with significant societal or economic impact, it is crucial to provide recommendations that are interpretable and justifiable, and equitable in terms of the outcome for both the protected and non-protected groups. Motivated by these two goals, this paper introduces a fairness-aware framework leveraging causal reasoning for generating a set of interpretable and actionable prescription rules (ruleset) toward betterment of an outcome while preventing exacerbating inequalities for protected groups. |
Benton Li; Nativ Levy; Brit Youngmann; Sainyam Galhotra; Sudeepa Roy; |
| 171 | Fast and Scalable Data Transfer Across Data Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in this paper, we introduce a holistic data transfer framework. |
Haralampos Gavriilidis; Kaustubh Beedkar; Matthias Boehm; Volker Markl; |
| 172 | Fast Approximate Similarity Join in Vector Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new join algorithm, SimJoin. |
Jiadong Xie; Jeffrey Xu Yu; Yingfan Liu; |
| 173 | Fast Hypertree Decompositions Via Linear Programming: Fractional and Generalized Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This remains a significant challenge despite research from both the database and theory communities. In this work we present Ralph (Randomized Approximation using Linear Programming for Hypertree-Decompositions), a fast algorithm to compute low width fractional and generalized hypertree decompositions for input hypergraphs, as well as lower bounds for these widths. |
Vaishali Surianarayanan; Anikait Mundhra; Ajaykrishnan E S; Daniel Lokshtanov; |
| 174 | Fast Maximum Common Subgraph Search: A Redundancy-Reduced Backtracking Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new backtracking algorithm called RRSplit, which at once achieves better practical efficiency and provides a non-trivial theoretical guarantee on the worst-case running time. |
Kaiqiang Yu; Kaixin Wang; Cheng Long; Laks Lakshmanan; Reynold Cheng; |
| 175 | Faster and Efficient Density Decomposition Via Proportional Response with Exponential Momentum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work explores density decomposition through market dynamics, where edges represent buyers and nodes represent sellers in a Fisher market model. |
Quan Xue; T-H. Hubert Chan; |
| 176 | Femur: A Flexible Framework for Fast and Secure Querying from Public Key-Value Store Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To formally provide provable guarantees, we introduce a novel concept of distance-based indistinguishability, which can facilitate users to comfortably relax their security requirements. |
Jiaoyi Zhang; Liqiang Peng; Mo Sha; Weiran Liu; Xiang Li; Sheng Wang; Feifei Li; Mingyu Gao; Huanchen Zhang; |
| 177 | Finding Logic Bugs in Graph-processing Systems Via Graph-cutting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Graph-cutting, a universal approach for detecting logic bugs in both GDBMSes and various algorithms in graph libraries. |
Qiuyang Mang; Jinsheng Ba; Pinjia He; Manuel Rigger; |
| 178 | Galley: Modern Query Optimization for Sparse Tensor Programs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that Galley produces programs that are 1-300x faster than competing methods for machine learning over joins and 5-20x faster than a state-of-the-art relational database for subgraph counting workloads with a minimal optimization overhead. |
Kyle Deeds; Willow Ahrens; Magdalena Balazinska; Dan Suciu; |
| 179 | GPH: An Efficient and Effective Perfect Hashing Scheme for GPU Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a micro-benchmark and devise an effective and general performance analysis model, which enables uniform and accurate lookup performance evaluation of GPU-based hash tables. |
Jiaping Cao; Le Xu; Man Lung Yiu; Jianbin Qin; Bo Tang; |
| 180 | Rule-Based Graph Cleaning with GPUs on A Single Machine Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We adopt a rule-based method that may embed machine learning models as predicates in the rules. |
Wenchao Bai; Wenfei Fan; Shuhao Liu; Kehan Pang; Xiaoke Zhu; Jiahui Jin; |
| 181 | Graph Edit Distance Estimation: A New Heuristic and A Holistic Evaluation of Learning-based Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (2)~More importantly, all these advancements have been evaluated against a simple combinatorial heuristic baseline, with their models shown to outperform it. In this paper, we aim to bridge this knowledge gap. |
Mouyi Xu; Lijun Chang; |
| 182 | GTX: A Write-Optimized Latch-free Graph Data System with Transactional Support Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces GTX, a standalone main-memory write-optimized graph data system that specializes in structural and graph property updates while enabling concurrent reads and graph analytics through ACID transactions. |
Libin Zhou; Lu Xing; Yeasir Rayhan; Walid G. Aref; |
| 183 | High-Throughput Ingestion for Video Warehouse: Comprehensive Configuration and Effective Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we aim at enabling real-time and high-throughput ingestion of hundreds of video streams and maximizing the overall accuracy, by constructing a proper ingestion plan for each video stream. |
Baiyan Zhang; Zepeng Li; Dongxiang Zhang; Huan Li; Kian-Lee Tan; Gang Chen; |
| 184 | HoneyComb: A Parallel Worst-Case Optimal Join on Multicores Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: To achieve true scalability on massive datasets, a modern query engine needs to be able to take advantage of large, shared-memory, multicore systems. Binary joins are conceptually … |
Jiacheng Wu; Dan Suciu; |
| 185 | HotStuff-1: Linear Consensus with One-Phase Speculation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces HotStuff-1, a BFT consensus protocol that improves the latency of HotStuff-1 by two network hops while maintaining linear communication complexity against faults. |
Dakai Kang; Suyash Gupta; Dahlia Malkhi; Mohammad Sadoghi; |
| 186 | How Good Are Learned Cost Models, Really? Insights from Query Optimization Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While these models have been shown to provide better prediction accuracy, only limited efforts have been made to investigate how well Learned Cost Models (LCMs) actually perform in query optimization and how they affect overall query performance. In this paper, we address this by a systematic study evaluating LCMs on three of the core query optimization tasks: join ordering, access path selection, and physical operator selection. |
Roman Heinrich; Manisha Luthra; Johannes Wehrstein; Harald Kornmayer; Carsten Binnig; |
| 187 | How to Grow An LSM-tree? Towards Bridging The Gap Between Theory and Practice Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on the analysis, we present a novel approach, Vertiorizon, which combines the strengths of both the vertical and horizontal schemes to achieve a superior balance between lookup, update, and space costs. |
Dingheng Mo; Siqiang Luo; Stratos Idreos; |
| 188 | Aero: Adaptive Query Processing of ML Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Query optimization is critical in relational database management systems (DBMSs) for ensuring efficient query processing. The query optimizer relies on precise selectivity and … |
Gaurav Tarlok Kakkar; Jiashen Cao; Aubhro Sengupta; Joy Arulraj; Hyesoon Kim; |
| 189 | Incremental Rule Discovery in Response to Parameter Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate incremental problems in response to updates Δ𝜎 and/or Δ𝛅, to compute rules added and/or removed with respect to 𝜎 + Δ𝜎 and 𝛅 + Δ𝛅. |
Haoxian Chen; Wenfei Fan; Jiaye Zheng; |
| 190 | Integral Densest Subgraph Search on Directed Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The state-of-the-art DS algorithms have prohibitively high costs or poor approximation ratios, making them unsuitable for practical applications. To address these dilemmas, in this paper, we propose a novel model called integral densest subgraph (IDS). |
Yalong Zhang; Rong-Hua Li; Longlong Lin; Qi Zhang; Lu Qin; Guoren Wang; |
| 191 | Interactive Graph Search Made Simple Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Utilizing novel findings on the problem characteristics, we develop an algorithmic framework for IGS that requires a designer to fill in the details for only two ”black-box” operations. |
Shangqi Lu; Ru Wang; Yufei Tao; |
| 192 | Intra-Query Runtime Elasticity for Cloud-Native Data Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the concept of Intra-Query Runtime Elasticity (IQRE) for cloud-native data analysis. |
Xukang Zhang; Huanchen Zhang; Xiaofeng Meng; |
| 193 | Learned Offline Query Planning Via Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to traditional online query optimizers, we propose an offline query optimizer that searches a wide variety of plans and incorporates query execution as a primitive. |
Jeffrey Tao; Natalie Maus; Haydn Jones; Yimeng Zeng; Jacob R. Gardner; Ryan Marcus; |
| 194 | LICS: Towards Theory-Informed Effective Visual Abstraction of Property Graph Schemas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current visual abstractions, such as the labeled schema graph (\l{}sg), simplify representation but suffers from visual clutter and limited feature support. To address these challenges, we propose a novel, generic, and extensible visual abstraction, labeled iconized composite schema (\l{}ics), whose design is informed by theories and principles from HCI, cognitive psychology, and visualization. |
Kasidis Chantharojwong; Sourav S Bhowmick; Byron Choi; |
| 195 | Logical and Physical Optimizations for SQL Query Execution Over Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our results highlight that adhering strictly to conventional query optimization principles fails to generate the best plans in terms of result quality. To tackle this challenge, we present a novel approach to enhance SQL results by applying query optimization techniques specifically adapted for LLMs. |
Dario Satriani; Enzo Veltri; Donatello Santoro; Sara Rosato; Simone Varriale; Paolo Papotti; |
| 196 | Low-Latency Transaction Scheduling Via Userspace Interrupts: Why Wait or Yield When You Can Preempt? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an efficient transaction context switching mechanism purely in userspace and scheduling policies that prioritize short, high-priority transactions without significantly affecting long-running queries. |
Kaisong Huang; Jiatang Zhou; Zhuoyue Zhao; Dong Xie; Tianzheng Wang; |
| 197 | Low Rank Learning for Offline Query Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent deployments of learned query optimizers use expensive neural networks and ad-hoc search policies. To address these issues, we introduce LimeQO, a framework for offline query optimization leveraging low-rank learning to efficiently explore alternative query plans with minimal resource usage. |
Zixuan Yi; Yao Tian; Zachary G. Ives; Ryan Marcus; |
| 198 | LpBound: Pessimistic Cardinality Estimation Using ℓp-Norms of Degree Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The cardinality estimator is a critical piece of a query optimizer, and is often the main culprit when the optimizer chooses a poor plan.This paper introduces LpBound, a pessimistic cardinality estimator for multi-join queries (acyclic or cyclic) with selection predicates and group-by clauses. |
Haozhe Zhang; Christoph Mayer; Mahmoud Abo Khamis; Dan Olteanu; Dan Suciu; |
| 199 | Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models Via Malleable Data and Model Parallelization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the scale of models and training data continues to grow, there is an expanding reliance on more GPUs to train large-scale models, which inevitably increases the likelihood of encountering dynamic stragglers that some devices lag behind in performance occasionally. |
Haoyang Li; Fangcheng Fu; Hao Ge; Sheng Lin; Xuanyu Wang; Jiawen Niu; Yujie Wang; Hailin Zhang; Xiaonan Nie; Bin Cui; |
| 200 | MatCo: Computing Match Cover of Subgraph Query Over Graph Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new problem to compute the match cover of a subgraph query. |
Zhichao Shi; Youhuan Li; Ziming Li; Yuequn Dou; Xionghu Zhong; Lei Zou; |
| 201 | Maximus: A Modular Accelerated Query Engine for Data Analytics on Heterogeneous Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Finally, on the infrastructure side, different storage types, disaggregated storage, disaggregated memory, networking, and interconnects are all rapidly evolving, which demands a degree of customization to optimize data movement well beyond established techniques. To tackle these challenges, in this paper, we present Maximus, a modular data processing engine that embraces heterogeneity from the ground up. |
Marko Kabi\'{c}; Shriram Chandran; Gustavo Alonso; |
| 202 | MIRAGE-ANNS: Mixed Approach Graph-based Indexing for Approximate Nearest Neighbor Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents MIRAGE-ANNS (Mixed Incremental Refinement Approach Graph-based Exploration for Approximate Nearest Neighbor Search) that constructs the index as fast as refinement-based approaches while retaining search performance comparable or better than increment-based ones. |
Sairaj Voruganti; M. Tamer \{O}zsu; |
| 203 | Mitigating The Impedance Mismatch Between Prediction Query Execution and Database Engine Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the mismatch, we propose to employ a prediction-aware operator in database engines, which leverages inference context reuse cache to achieve an automatic one-off inference context setup and batch-aware function invocation to ensure desirable batching inference. |
Chenyang Zhang; Junxiong Peng; Chen Xu; Quanqing Xu; Chuanhui Yang; |
| 204 | Mnemosyne: Dynamic Workload-Aware BF Tuning Via Accurate Statistics in LSM Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design Mnemosyne, a BF reallocation framework for evolving LSM trees that does not require prior workload knowledge. |
Zichen Zhu; Yanpeng Wei; Ju Hyoung Mun; Manos Athanassoulis; |
| 205 | Moving on From Group Commit: Autonomous Commit Enables High Throughput and Low Latency on NVMe SSDs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As we show in this paper, existing commit processing protocols fail to fully leverage modern NVMe SSDs to deliver both high throughput and low-latency durable commits. |
Lam-Duy Nguyen; Adnan Alhomssi; Tobias Ziegler; Viktor Leis; |
| 206 | Nested Parquet Is Flat, Why Not Use It? How To Scan Nested Data With On-the-Fly Key Generation and Joins Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, many query engines either do not support nested data or process it with substantially worse performance than relational data. In this work, we close this gap and present a new way to leverage relational query engines for nested data that is stored in this flat columnar file format. |
Alice Rey; Maximilian Rieger; Thomas Neumann; |
| 207 | NEXT: A New Secondary Index Framework for LSM-based Data Storage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the limitations in existing auxiliary structures for non-key attributes queries, this paper proposes a novel secondary index framework, NEXT, for LSM-based key-value storage system. |
Jiachen Shi; Jingyi Yang; Gao Cong; Xiaoli Li; |
| 208 | OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These factors include the incompleteness of the framework, failure to follow instructions, and model hallucinations. To address these problems, we propose OpenSearch-SQL, which divides the Text-to-SQL task into four main modules: Preprocessing, Extraction, Generation, and Refinement, along with an Alignment module based on a consistency alignment mechanism. |
Xiangjin Xie; Guangwei Xu; Lingyan Zhao; Ruijie Guo; |
| 209 | Parallel K-Core Decomposition: Theory and Practice Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes efficient solutions for k-core decomposition with high parallelism. |
Youzhe Liu; Xiaojun Dong; Yan Gu; Yihan Sun; |
| 210 | PDX: A Data Layout for Vector Similarity Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Partition Dimensions Across (PDX), a data layout for vectors (e.g., embeddings) that, similar to PAX [6], stores multiple vectors in one block, using a vertical layout for the dimensions (Figure 1). |
Leonardo Kuffo; Elena Krippner; Peter Boncz; |
| 211 | Physical Visualization Design: Decoupling Interface and System Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given an interfaces underlying data flow, interactions with latency expectations, and resource constraints, PVD checks if the interface is feasible and, if so, proposes and instantiates a middleware architecture spanning the client, server, and cloud DBMS that meets the expectations.To this end, this paper presents Jade, the first prototype PVD tool that enables design independence. |
Yiru Chen; Xupeng Li; Jeffrey Tao; Lana Ramjit; Subrata Mitra; Javad Ghaderi; Ravi Netravali; Aditya Parameswaran; Dan Rubenstein; Eugene Wu; |
| 212 | PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods struggle to simultaneously provide user-specified error guarantees, eliminate maintenance overheads, and avoid modifications to database management systems. To address these challenges, we introduce two novel techniques, TAQA and BSAP. |
Yuxuan Zhu; Tengjun Jin; Stefanos Baziotis; Chengsong Zhang; Charith Mendis; Daniel Kang; |
| 213 | PLM4NDV: Minimizing Data Access for Number of Distinct Values Estimation with Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite decades of research, most existing methods require either a significant amount of samples through uniform random sampling or access to the entire column to produce estimates, leading to substantial data access costs and potentially ineffective estimations in scenarios with limited data access. In this paper, we propose leveraging semantic information, i.e., schema, to address these challenges. |
Xianghong Xu; Xiao He; Tieying Zhang; Lei Zhang; Rui Shi; Jianjun Chen; |
| 214 | Pneuma: Leveraging LLMs for Tabular Data Representation and Retrieval in An End-to-End System Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The advent of large language models (LLMs) offers a unique opportunity for users to ask questions directly in natural language, making dataset discovery more intuitive, accessible, and efficient.In this paper, we introduce Pneuma, a retrieval-augmented generation (RAG) system designed to efficiently and effectively discover tabular data. |
Muhammad Imam Luthfi Balaka; David Alexander; Qiming Wang; Yue Gong; Adila Krisnadhi; Raul Castro Fernandez; |
| 215 | PQCache: Product Quantization-based KVCache for Long Context LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose PQCache, which employs Product Quantization (PQ) to manage KVCache, maintaining model quality while ensuring low serving latency. |
Hailin Zhang; Xiaodong Ji; Yilin Chen; Fangcheng Fu; Xupeng Miao; Xiaonan Nie; Weipeng Chen; Bin Cui; |
| 216 | Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: RaBitQ uses 1 bit per dimension for quantization and compresses vectors with a large compression rate. In this paper, we extend RaBitQ to compress vectors with flexible compression rates – it achieves this by using B bits per dimension for quantization with B = 1, 2, … It inherits the theoretical guarantees of RaBitQ and achieves the asymptotic optimality in terms of the trade-off between space and error bounds as to be proven in this study. |
Jianyang Gao; Yutong Gou; Yuexuan Xu; Yongyi Yang; Cheng Long; Raymond Chi-Wing Wong; |
| 217 | Privacy and Accuracy-Aware AI/ML Model Deduplication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When deduplicating a target model, we dynamically schedule accuracy validations and apply the Sparse Vector Technique to reduce the privacy costs associated with private validation data. |
Hong Guan; Lei Yu; Lixi Zhou; Li Xiong; Kanchan Chowdhury; Lulu Xie; Xusheng Xiao; Jia Zou; |
| 218 | PrivPetal: Relational Data Synthesis Via Permutation Relations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a different direction: synthesizing a flattened relation and subsequently decomposing it down to base relations, which eliminates the need to generate join keys. |
Kuntai Cai; Xiaokui Xiao; Yin Yang; |
| 219 | PrivRM: A Framework for Range Mean Estimation Under Local Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework for Private Range Mean (PrivRM) estimation under LDP. |
Liantong Yu; Qingqing Ye; Rong Du; |
| 220 | Relevance Queries for Interval Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present experiments on real datasets that demonstrate the efficiency of our framework over baseline approaches. |
Panagiotis Bouros; Nikos Mamoulis; |
| 221 | Rethinking The Compaction Policies in LSM-trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to treat the compaction operation in LSM-trees as a computational and I/O-bandwidth investment for improving the system’s future query throughput, and thus rethink the compaction policy designs. |
Hengrui Wang; Jiansheng Qiu; Fangzhou Yuan; Huanchen Zhang; |
| 222 | Revisiting Graph Analytics Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing benchmarks often fall short of fully assessing performance due to limitations in core algorithm selection, data generation processes (and the corresponding synthetic datasets), as well as the neglect of API usability evaluation. To address these shortcomings, we propose a novel graph analytics benchmark. |
Lingkai Meng; Yu Shao; Long Yuan; Longbin Lai; Peng Cheng; Xue Li; Wenyuan Yu; Wenjie Zhang; Xuemin Lin; Jingren Zhou; |
| 223 | RLOMM: An Efficient and Robust Online Map Matching Framework with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel framework that achieves high accuracy and efficient matching while ensuring robustness in handling diverse scenarios. |
Minxiao Chen; Haitao Yuan; Nan Jiang; Zhihan Zheng; Sai Wu; Ao Zhou; Shangguang Wang; |
| 224 | RM2: Answer Counting Queries Efficiently Under Shuffle Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our contributions include a baseline shuffle-DP mechanism that naively adapts the matrix mechanism, followed by an improved mechanism that reduces message complexity while maintaining error levels comparable to central-DP. |
Qiyao Luo; Jianzhe Yu; Wei Dong; Quanqing Xu; Chuanhui Yang; Ke Yi; |
| 225 | Robust Privacy-Preserving Triangle Counting Under Edge Local Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a vertex-centric triangle counting algorithm under edge LDP, which improves data utility by leveraging a larger part of the noisy adjacency matrix. |
Yizhang He; Kai Wang; Wenjie Zhang; Xuemin Lin; Ying Zhang; Wei Ni; |
| 226 | RWalks: Random Walks As Attribute Diffusers for Filtered Vector Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, many real applications require answers in a few milliseconds with high recall on large collections. Graph-based methods are considered the best choice for such applications, despite a lack of theoretical guarantees on query accuracy. |
Anas Ait Aomar; Karima Echihabi; Marco Arnaboldi; Ioannis Alagiannis; Damien Hilloulin; Manal Cherkaoui; |
| 227 | SBSC: A Fast Self-tuned Bipartite Proximity Graph-based Spectral Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, extrinsic parameters such as the number of representatives and nearby representatives of data instances, influence the clustering performance, time, and memory usage. Therefore, in this work, we construct a parameter-free bipartite graph to further improve the clustering quality and computational cost of SC by introducing a locality-based sparsification technique. |
Abdul Atif Khan; Rashmi Maheshwari; Mohammad Maksood Akhter; Sraban Kumar Mohanty; |
| 228 | Scalable Complex Event Processing on Video Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Bobsled, a novel video stream processing system designed to efficiently support complex event queries. |
Chenxia Han; Chaokun Chang; Srijan Srivastava; Yao Lu; Eric Lo; |
| 229 | Self-Enhancing Video Data Management System for Compositional Events with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce VOCAL-UDF, a novel self-enhancing system that supports compositional queries over videos without the need for predefined modules. |
Enhao Zhang; Nicole Sullivan; Brandon Haynes; Ranjay Krishna; Magdalena Balazinska; |
| 230 | Serf: Streaming Error-Bounded Floating-Point Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first Streaming ERror-bounded Floating-point compression Serf, which has two implementations: Serf-Qt and Serf-XOR. |
Ruiyuan Li; Zechao Chen; Ruyun Lu; Xiaolong Xu; Guangchao Yang; Chao Chen; Jie Bao; Yu Zheng; |
| 231 | SHIELD: Encrypting Persistent Data of LSM-KVS from Monolithic to Disaggregated Storage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We achieve our objective through three contributions: (1) A fine-grained integration of encryption into LSM-KVS write path to minimize performance overhead from exposure-limiting practices like using unique encryption keys per file and regularly re-encrypting using new encryption keys during compaction, (2) Mitigating performance degradation caused by recurring encryption of Write-Ahead Log (WAL) writes by using a buffering solution and (3) Extending confidentiality guarantees to DS by designing a metadata-enabled encryption-key-sharing mechanism and a secure local cache for high scalability and flexibility. |
Viraj Thakkar; Dongha Kim; Yingchun Lai; Hokeun Kim; Zhichao Cao; |
| 232 | SPACE: Cardinality Estimation for Path Queries Using Cardinality-Aware Sequence-based Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the problem of estimating the cardinality of path patterns in graph databases, and we propose the Sequence-based Path Pattern Cardinality Estimator (SPACE). |
Mehmet Aytimur; Theodoros Chondrogiannis; Michael Grossniklaus; |
| 233 | SpareLLM: Automatically Selecting Task-Specific Minimum-Cost Large Language Models Under Equivalence Constraint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SpareLLM, Selecting Passable And Resource-Efficient LLMs, a novel LLM framework designed to minimize the inference costs (i.e., resource-efficient) of large-scale NLP tasks while ensuring sufficient result quality (i.e., passable). |
Saehan Jo; Immanuel Trummer; |
| 234 | SPARTAN: Data-Adaptive Symbolic Time-Series Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite decades of development, existing approaches have several key limitations that often result in unsatisfactory performance: they (i) rely on data-agnostic numeric approximations, disregarding intrinsic properties of the time series; (ii) decompose dimensions into equal-sized subspaces, assuming independence among dimensions; and (iii) allocate a uniform encoding budget for discretizing each dimension or subspace, assuming balanced importance. To address these shortcomings, we propose SPARTAN, a novel data-adaptive symbolic approximation method that intelligently allocates the encoding budget according to the importance of the constructed uncorrelated dimensions. |
Fan Yang; John Paparrizos; |
| 235 | Subgroup Discovery with Small and Alternative Feature Sets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe how to integrate both constraint types into heuristic subgroup-discovery methods as well as a novel Satisfiability Modulo Theories (SMT) formulation, which enables a solver-based search for subgroups. |
Jakob Bach; |
| 236 | SuSe: Summary Selection for Regular Expression Subsequence Aggregation Over Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Rather, only an aggregate over the matches is typically fetched at specific, yet unknown time points. To cater for these scenarios, we present SuSe, a novel architecture for RegEx evaluation that is based on a query-specific summary of the stream. |
Steven Purtzel; Matthias Weidlich; |
| 237 | SWASH: A Flexible Communication Framework with Sliding Window-Based Cache Sharing for Scalable DGNN Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To support distributed sliding window training, we present SWASH, a scalable and flexible communication framework that utilizes a Sliding Window-based cAche SHaring technique. |
Zhen Song; Yu Gu; Tianyi Li; Yushuai Li; Qing Sun; Yanfeng Zhang; Christian S. Jensen; Ge Yu; |
| 238 | SwiftSpatial: Spatial Joins on Modern Hardware Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore hardware acceleration for spatial joins by proposing SwiftSpatial, an FPGA-based accelerator that can be deployed in data centers and at the edge. |
Wenqi Jiang; Oleh-Yevhen Khavrona; Martin Parvanov; Gustavo Alonso; |
| 239 | Synthesizing Third Normal Form Schemata That Minimize Integrity Maintenance and Update Overheads: Parameterizing 3NF By The Numbers of Minimal Keys and Functional Dependencies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, dependency-preservation ensures that data integrity can be maintained on individual relation schemata without having to join them, but may need to tolerate a priori unbounded levels of data redundancy and integrity faults. As our main contribution we parameterize 3NF schemata by the numbers of minimal keys and functional dependencies they exhibit. |
Zhuoxing Zhang; Sebastian Link; |
| 240 | Styx: Transactional Stateful Functions on Streaming Dataflows Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Styx, a novel dataflow-based SFaaS runtime that executes serializable transactions consisting of stateful functions that form arbitrary call-graphs with exactly-once guarantees. |
Kyriakos Psarakis; George Christodoulou; Georgios Siachamis; Marios Fragkoulis; Asterios Katsifodimos; |
| 241 | T3: Accurate and Fast Performance Prediction for Relational Database Systems With Compiled Decision Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Tuple Time Tree (T3), a new model that is both accurate and fast. |
Maximilian Rieger; Thomas Neumann; |
| 242 | Table Overlap Estimation Through Graph Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Candidate duplicate or related tables to support this task can be identified via the estimation of the largest table overlap. Unfortunately, current solutions for finding it present serious scalability issues for heavy workloads: Sloth, the state of-the-art framework for its estimation, requires more than three days of machine time for computing 100k table overlaps.In this paper, we introduce ARMADILLO, an approach based on graph neural networks that learns table embeddings whose cosine similarity approximates the overlap ratio between tables, i.e., the ratio between the area of their largest table overlap and the area of the smaller table in the pair. |
Francesco Pugnaloni; Luca Zecchini; Matteo Paganelli; Matteo Lissandrini; Felix Naumann; Giovanni Simonini; |
| 243 | TableDC: Deep Clustering for Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a deep clustering algorithm for tabular data (TableDC) that reflects the properties of data management applications that cluster tables (schema inference), rows (entity resolution) and columns (domain discovery). |
Hafiz Tayyab Rauf; Andr\'{e} Freitas; Norman William Paton; |
| 244 | The Best of Both Worlds: On Repairing Timestamps and Attribute Values for Multivariate Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such a strategy may lead to over-repairing and introduce additional errors, by ignoring the mutual reference between timestamps and attribute values. Therefore, in this study, rather than repairing timestamps and attribute values respectively by calling different methods in turn, we consider the repairing for both attribute values and timestamps simultaneously. |
Jingyu Zhu; Weiwei Deng; Yu Sun; Shaoxu Song; Haiwei Zhang; Xiaojie Yuan; |
| 245 | Two Birds with One Stone: Efficient Deep Learning Over Mislabeled Data Through Subset Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The existing approaches, targeting either speeding up the training by selecting a subset of representative training instances (subset selection) or eliminating the negative effect of mislabels during training (mislabel detection), do not perform well in this scenario due to overlooking one of these two problems. To fill this gap, we propose Deem, a novel data-efficient framework that selects a subset of representative training instances under label uncertainty. |
Yuhao Deng; Chengliang Chai; Kaisen Jin; Linan Zheng; Lei Cao; Ye Yuan; Guoren Wang; |
| 246 | Understanding The Black Box: A Deep Empirical Dive Into Shapley Value Approximations for Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through the study, we aim to encourage further research on Shapley value approximations, advancing data-centric explainable AI. |
Suchit Gupte; John Paparrizos; |
| 247 | Using Process Calculus for Optimizing Data and Computation Sharing in Complex Stateful Parallel Computations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose novel techniques that exploit data and computation sharing to improve the performance of complex stateful parallel computations, like agent-based simulations. |
Zilu Tian; Dan Olteanu; Christoph Koch; |
| 248 | Wait and See: A Delayed Transactions Partitioning Approach in Deterministic Database Systems for Better Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present DelayPart, a deterministic database transaction engine that employs a ”wait and see” strategy to address contextual conflicts between transactions within each batch. |
Yuan Sui; Xiaochun Yang; Bin Wang; Yujie Zhang; Baihua Zheng; |
| 249 | Yannakakis+: Practical Acyclic Query Evaluation with Theoretical Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While the 40-year-old Yannakakis algorithm has strong theoretical running time guarantees, it has not been adopted in real systems due to its high hidden constant factor. In this paper, we strive to close this gap by proposing Yannakakis+, an improved version of the Yannakakis algorithm, which is more practically efficient while preserving its theoretical guarantees. |
Qichen Wang; Bingnan Chen; Binyang Dai; Ke Yi; Feifei Li; Liang Lin; |
| 250 | Zombie Hashing: Reanimating Tombstones in Graveyard Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, tombstones require periodic redistribution, which, in turn, requires a complete halt of regular operations. This makes linear probing not suitable in practical applications where periodic halts are unacceptable.In this paper, we present a solution to forestall primary clustering in linear probing hash tables, ensuring high data locality and consistent performance even at high load factors. |
Yuvaraj Chesetti; Benwei Shi; Jeff M. Phillips; Prashant Pandey; |