Most Influential ArXiv (Databases) Papers (2024-10)
The field of Databases in arXiv covers database management, datamining, and data processing. Roughly it includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1. Paper Digest Team analyzes all papers published in this field in the past years, and presents up to 30 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2024-10).
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to write, review, get answers and more.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Most Influential ArXiv (Databases) Papers (2024-10)
Year | Rank | Paper | Author(s) |
---|---|---|---|
2024 | 1 | An Analysis of XML Compression Efficiency IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an XML test corpus and a combined efficiency metric integrating compression ratio and execution speed. |
Christopher James Augeri; Barry E. Mullins; Leemon C. Baird III; Dursun A. Bulutoglu; Rusty O. Baldwin; |
2024 | 2 | A Critique of Snapshot Isolation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, Google Percolator, implements lock-based snapshot isolation on top of BigTable. We show in this paper that this compromise is not necessary in lock-free implementations of transactional support. |
Daniel Gómez Ferro; Maysam Yabandeh; |
2024 | 3 | Mining Sequential Patterns in Uncertain Databases Using Hierarchical Index Structure IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose multiple theoretically tightened pruning upper bounds that remarkably reduce the mining space. |
Kashob Kumar Roy; Md Hasibul Haque Moon; Md Mahmudur Rahman; Chowdhury Farhan Ahmed; Carson K. Leung; |
2023 | 1 | Text-to-SQL Empowered By Large Language Models: A Benchmark Evaluation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this challenge, in this paper, we first conduct a systematical and extensive comparison over existing prompt engineering methods, including question representation, example selection and example organization, and with these experimental results, we elaborate their pros and cons. Based on these findings, we propose a new integrated solution, named DAIL-SQL, which refreshes the Spider leaderboard with 86.6% execution accuracy and sets a new bar. |
DAWEI GAO et. al. |
2023 | 2 | LDPTrace: Locally Differentially Private Trajectory Synthesis IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its potential, existing point-based perturbation mechanisms are not suitable for real-world scenarios due to poor utility, dependence on external knowledge, high computational overhead, and vulnerability to attacks. To address these limitations, we introduce LDPTrace, a novel locally differentially private trajectory synthesis framework. |
YUNTAO DU et. al. |
2023 | 3 | Lero: A Learning-to-Rank Query Optimizer IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a learning-to-rank query optimizer, called Lero, which builds on top of a native query optimizer and continuously learns to improve the optimization performance. |
RONG ZHU et. al. |
2023 | 4 | A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although there are not many articles describing existing or introducing new vector database architectures, the approximate nearest neighbor search problem behind vector databases has been studied for a long time, and considerable related algorithmic articles can be found in the literature. This article attempts to comprehensively review relevant algorithms to provide a general understanding of this booming research area. |
Yikun Han; Chunjiang Liu; Pengfei Wang; |
2023 | 5 | From BERT to GPT-3 Codex: Harnessing The Potential of Very Large Language Models for Data Management IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of the tutorial is to introduce database researchers to the latest generation of language models, and to their use cases in the domain of data management. |
Immanuel Trummer; |
2023 | 6 | Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a holistic survey of CLQA with a detailed taxonomy studying the field from multiple angles, including graph types (modality, reasoning domain, background semantics), modeling aspects (encoder, processor, decoder), supported queries (operators, patterns, projected variables), datasets, evaluation metrics, and applications. |
Hongyu Ren; Mikhail Galkin; Michael Cochez; Zhaocheng Zhu; Jure Leskovec; |
2022 | 1 | Privacy-Preserving Record Linkage IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Given several databases containing person-specific data held by different organizations, Privacy-Preserving Record Linkage (PPRL) aims to identify and link records that correspond … |
Dinusha Vatsalan; Dimitrios Karapiperis; Vassilios S. Verykios; |
2022 | 2 | The Effects of Data Quality on Machine Learning Performance IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore empirically the relationship between six of the traditional data quality dimensions and the performance of fifteen widely used machine learning (ML) algorithms covering the tasks of classification, regression, and clustering, with the goal of explaining their performance in terms of data quality. |
LUKAS BUDACH et. al. |
2022 | 3 | Metaverse: Survey, Applications, Security, and Opportunities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we make the following contributions. We first introduce the basic concepts such as the development process, definition, and characteristics of the Metaverse. |
Jiayi Sun; Wensheng Gan; Han-Chieh Chao; Philip S. Yu; |
2022 | 4 | AIM: An Adaptive and Iterative Mechanism for Differentially Private Synthetic Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose AIM, a new algorithm for differentially private synthetic data generation. |
Ryan McKenna; Brett Mullins; Daniel Sheldon; Gerome Miklau; |
2022 | 5 | Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Starmie, an end-to-end framework for dataset discovery from data lakes (with table union search as the main use case). |
Grace Fan; Jin Wang; Yuliang Li; Dan Zhang; Renée Miller; |
2022 | 6 | Biolink Model: A Universal Schema for Knowledge Graphs in Clinical, Biomedical, and Translational Science IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. |
DEEPAK R. UNNI et. al. |
2022 | 7 | LDP-IDS: Local Differential Privacy for Infinite Data Streams IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing few LDP studies over streams are either applicable to finite streams only or suffering from insufficient protection. This paper investigates this problem by proposing LDP-IDS, a novel $w$-event LDP paradigm to provide practical privacy guarantee for infinite streams at users end, and adapting the popular budget division framework in centralized differential privacy (CDP). |
XUEBIN REN et. al. |
2022 | 8 | SANTOS: Relationship-based Semantic Table Union Search IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of union search. |
AAMOD KHATIWADA et. al. |
2022 | 9 | Towards Dynamic and Safe Configuration Tuning for Cloud Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill these gaps, we propose OnlineTune, which tunes the online databases safely in changing cloud environments. |
XINYI ZHANG et. al. |
2022 | 10 | Balsa: Learning A Query Optimizer Without Expert Demonstrations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate for the first time that learning to optimize queries without learning from an expert optimizer is both possible and efficient. |
ZONGHENG YANG et. al. |
2022 | 11 | Representation Bias in Data: A Survey on Identification and Resolution Techniques IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There is still a long way to fully address representation bias issues in data. The authors hope that this survey motivates researchers to approach these challenges in the future by observing existing work within their respective domains. |
Nima Shahbazi; Yin Lin; Abolfazl Asudeh; H. V. Jagadish; |
2022 | 12 | Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce zero-shot cost models which enable learned cost estimation that generalizes to unseen databases. |
Benjamin Hilprecht; Carsten Binnig; |
2022 | 13 | End-to-end Optimization of Machine Learning Prediction Queries IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Raven, a production-ready system for optimizing prediction queries. |
KWANGHYUN PARK et. al. |
2022 | 14 | ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the increasing scale of KGs renders it hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. |
YUNJUN GAO et. al. |
2022 | 15 | Towards Blockchain-Based Secure Data Management for Remote Patient Monitoring IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Blockchain is an emerging distributed technology that can solve these issues due to its immutability and architectural nature that prevent records manipulation or alterations. In this paper, we discuss the progress and opportunities of remote patient monitoring using futuristic blockchain technologies and its two primary frameworks: Ethereum and Hyperledger Fabric. |
Md Jobair Hossain Faruk; Hossain Shahriar; Maria Valero; Sweta Sneha; Sheikh I. Ahamed Mohammad Rahman; |
2022 | 16 | Big Data Meets Metaverse: A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we provide a comprehensive review of how Metaverse is changing big data. |
Jiayi Sun; Wensheng Gan; Zefeng Chen; Junhui Li; Philip S. Yu; |
2022 | 17 | Manu: A Cloud Native Vector Database Management System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the past three years, through interaction with our 1200+ industry users, we have sketched a vision for the features that next-generation vector databases should have, which include long-term evolvability, tunable consistency, good elasticity, and high performance. We present Manu, a cloud native vector database that implements these features. |
RENTONG GUO et. al. |
2022 | 18 | PG-Schema: Schemas for Property Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. |
RENZO ANGLES et. al. |
2022 | 19 | LlamaTune: Sample-Efficient DBMS Configuration Tuning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: LlamaTune employs an automated dimensionality reduction technique based on randomized projections, a biased-sampling approach to handle special values for certain knobs, and knob values bucketization, to reduce the size of the search space. |
KONSTANTINOS KANELLIS et. al. |
2022 | 20 | HIE-SQL: History Information Enhanced Network for Context-Dependent Text-to-SQL Semantic Parsing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In view of the mismatch, we treat natural language and SQL as two modalities and propose a bimodal pre-trained model to bridge the gap between them. |
Yanzhao Zheng; Haibin Wang; Baohua Dong; Xingjun Wang; Changshan Li; |
2022 | 21 | Query Processing on Tensor Computation Runtimes IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how database management systems can ride the wave of innovation happening in the AI space. |
DONG HE et. al. |
2021 | 1 | On Data Lake Architectures and Metadata Management IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we provide in this paper a comprehensive state of the art of the different approaches to data lake design. |
Pegdwendé Sawadogo; Jérôme Darmont; |
2021 | 2 | Fairness in Rankings and Recommendations: An Overview IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim at presenting a toolkit of definitions, models and methods used for ensuring fairness in rankings and recommendations. |
Evaggelia Pitoura; Kostas Stefanidis; Georgia Koutrika; |
2021 | 3 | Blockchain Transaction Processing IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: A blockchain is an append-only linked-list of blocks, which is maintained at each participating node. Each block records a set of transactions and their associated metadata. … |
Suyash Gupta; Mohammad Sadoghi; |
2021 | 4 | Graph Pattern Matching in GQL and SQL/PGQ IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper, written by members of WG3 and LDBC, presents the key elements of the GPML of SQL/PGQ and GQL in advance of the publication of these new standards. |
ALIN DEUTSCH et. al. |
2021 | 5 | Updatable Learned Index with Precise Positions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose LIPP, a brand new framework of learned index to address such issues. |
JIACHENG WU et. al. |
2021 | 6 | Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. |
YUXING HAN et. al. |
2021 | 7 | A Survey of RDF Stores & SPARQL Engines for Querying Knowledge Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. |
Waqas Ali; Muhammad Saleem; Bin Yao; Aidan Hogan; Axel-Cyrille Ngonga Ngomo; |
2021 | 8 | Annotating Columns with Pre-trained Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the problem of annotating table columns (i.e., predicting column types and the relationships between columns) using only information from the table itself. |
YOSHIHIKO SUHARA et. al. |
2021 | 9 | A Survey on Advancing The DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Cost-based optimizer studied in this paper is adopted in almost all current database systems. |
Hai Lan; Zhifeng Bao; Yuwei Peng; |
2021 | 10 | GitTables: A Large-Scale Corpus of Relational Tables IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The evaluation of our annotation pipeline on the T2Dv2 benchmark illustrates that our approach provides results on par with human annotations. We present three applications of GitTables, demonstrating its value for learned semantic type detection models, schema completion methods, and benchmarks for table-to-KG matching, data search, and preparation. |
Madelon Hulsebos; Çağatay Demiralp; Paul Groth; |
2021 | 11 | A Survey on Locality Sensitive Hashing Algorithms and Their Applications IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey paper, we provide a review of state-of-the-art LSH and Distributed LSH techniques. |
Omid Jafari; Preeti Maurya; Parth Nagarkar; Khandker Mushfiqul Islam; Chidambaram Crushev; |
2021 | 12 | Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Along the way, we introduce a specialized data model for representing and reasoning about repeatedly run components in these ML pipelines, which we call model graphlets. |
Doris Xin; Hui Miao; Aditya Parameswaran; Neoklis Polyzotis; |
2021 | 13 | A Simple Standard for Sharing Ontological Mappings (SSSOM) IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the SSSOM standard, describe several use cases, and survey some existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable, and Reusable (FAIR). |
NICOLAS MATENTZOGLU et. al. |
2021 | 14 | Data Management in Microservices: State of The Practice, Challenges, and Research Directions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge this gap, we conducted a systematic literature review of representative articles reporting the adoption of microservices, we analyzed a set of popular open-source microservice applications, and we conducted an online survey to cross-validate the findings of the previous steps with the perceptions and experiences of over 120 experienced practitioners and researchers. |
Rodrigo Laigner; Yongluan Zhou; Marcos Antonio Vaz Salles; Yijian Liu; Marcos Kalinowski; |
2021 | 15 | A Unified Deep Model of Learning from Both Data and Queries for Cardinality Estimation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to close the gap between data-driven and query-driven methods by proposing a new unified deep autoregressive model, UAE, that learns the joint data distribution from both the data and query workload. |
Peizhi Wu; Gao Cong; |
2021 | 16 | Flow-Loss: Learning Cardinality Estimates That Matter IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new loss function, Flow-Loss, that explicitly optimizes for better query plans by approximating the optimizer’s cost model and dynamic programming search algorithm with analytical functions. To evaluate our approach, we introduce the Cardinality Estimation Benchmark, which contains the ground truth cardinalities for sub-plans of over 16K queries from 21 templates with up to 15 joins. |
PARIMARJAN NEGI et. al. |
2021 | 17 | DB-BERT: A Database Tuning Tool That Reads The Manual IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: DB-BERT applies large, pre-trained language models (specifically, the BERT model) for text analysis. |
Immanuel Trummer; |
2021 | 18 | Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, this paper provides a comprehensive evaluation of configuration tuning techniques from a broader perspective, hoping to better benefit the database community. |
XINYI ZHANG et. al. |
2021 | 19 | A Unified Metamodel for NoSQL and Relational Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the U-Schema unified metamodel able to represent logical schemas for the four most popular NoSQL paradigms (columnar, document, key-value, and graph) as well as relational schemas. |
Carlos J. Fernández Candel; Diego Sevilla Ruiz; Jesús J. García-Molina; |
2021 | 20 | KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we fill this gap by introducing KGTorrent, a dataset of Python Jupyter notebooks with rich metadata retrieved from Kaggle, a platform hosting data science competitions for learners and practitioners with any levels of expertise. |
Luigi Quaranta; Fabio Calefato; Filippo Lanubile; |
2021 | 21 | Farview: Disaggregated Memory with Operator Off-loading for Database Engines IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore disaggregation by taking it one step further and applying it to memory (DRAM). |
DARIO KOROLIJA et. al. |
2021 | 22 | Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of The HuggingFace and GEM Data and Model Cards IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To help with the standardization of documentation, we present two case studies of efforts that aim to develop reusable documentation templates — the HuggingFace data card, a general purpose card for datasets in NLP, and the GEM benchmark data and model cards with a focus on natural language generation. |
ANGELINA MCMILLAN-MAJOR et. al. |
2021 | 23 | APEX: A High-Performance Learned Index on Persistent Memory IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes APEX, a new PM-optimized learned index that offers high performance, persistence, concurrency, and instant recovery. |
Baotong Lu; Jialin Ding; Eric Lo; Umar Farooq Minhas; Tianzheng Wang; |
2021 | 24 | Lux: Always-on Visualization Recommendations for Exploratory Dataframe Workflows IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Lux, an always-on framework for accelerating visual insight discovery in dataframe workflows. |
DORIS JUNG-LIN LEE et. al. |
2021 | 25 | Real-World Trajectory Sharing with Local Differential Privacy IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these concerns, we propose a local differentially private mechanism that is based on perturbing hierarchically-structured, overlapping $n$-grams (i.e., contiguous subsequences of length $n$) of trajectory data. |
Teddy Cunningham; Graham Cormode; Hakan Ferhatosmanoglu; Divesh Srivastava; |
2021 | 26 | HUGE: An Efficient and Scalable Subgraph Enumeration System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a system called HUGE to efficiently process subgraph enumeration at scale in the distributed context. |
Zhengyi Yang; Longbin Lai; Xuemin Lin; Kongzhang Hao; Wenjie Zhang; |
2021 | 27 | Data Quality Certification Using ISO/IEC 25012: Industrial Experiences IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present findings from the point of view of both the data quality evaluation team and the organizations that underwent the evaluation process. |
Fernando Gualo; Moisés Rodríguez; Javier Verdugo; Ismael Caballero; Mario Piattini; |
2021 | 28 | Data Acquisition for Improving Machine Learning Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present research on the practical problem of obtaining data in order to improve the accuracy of ML models. We then propose two data acquisition strategies that consider a trade-off between exploration during which we obtain data to learn about the distribution of a provider’s data and exploitation during which we optimize our data inquiries utilizing the gained knowledge. |
Yifan Li; Xiaohui Yu; Nick Koudas; |
2021 | 29 | Group-Based Privacy Preservation Techniques for Process Mining IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss the challenges regarding directly applying existing well-known group-based privacy preservation techniques, e.g., k-anonymity, l-diversity, etc, to event data. |
Majid Rafiei; Wil M. P. van der Aalst; |
2021 | 30 | Correlation Sketches for Approximate Join-Correlation Queries IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new class of data augmentation queries: join-correlation queries. |
Aécio Santos; Aline Bessa; Fernando Chirigati; Christopher Musco; Juliana Freire; |
2020 | 1 | Deep Entity Matching With Pre-Trained Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Ditto, a novel entity matching system based on pre-trained Transformer-based language models. |
Yuliang Li; Jinfeng Li; Yoshihiko Suhara; AnHai Doan; Wang-Chiew Tan; |
2020 | 2 | Domain-specific Knowledge Graphs: A Survey IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Knowledge Graphs (KGs) have made a qualitative leap and effected a real revolution in knowledge representation. This is leveraged by the underlying structure of the KG which … |
Bilal Abu-Salih; |
2020 | 3 | A Survey On Trajectory Data Management, Analytics, And Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, such as querying spatial-only and spatial-textual trajectory data, and trajectory clustering. |
Sheng Wang; Zhifeng Bao; J. Shane Culpepper; Gao Cong; |
2020 | 4 | RadixSpline: A Single-Pass Learned Index IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data and is competitive with state-of-the-art learned index models, like RMI, in size and lookup performance. |
ANDREAS KIPF et. al. |
2020 | 5 | Tsunami: A Learned Multi-dimensional Index For Correlated Data And Skewed Workloads IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes, in addition to up to 11X faster query performance and 170X smaller index size than optimally-tuned traditional indexes. |
Jialin Ding; Vikram Nathan; Mohammad Alizadeh; Tim Kraska; |
2020 | 6 | Benchmarking Learned Indexes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a unified benchmark that compares well-tuned implementations of three learned index structures against several state-of-the-art traditional baselines. |
RYAN MARCUS et. al. |
2020 | 7 | Dataset Discovery In Data Lakes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We refer to this as the problem of dataset discovery in data lakes and this paper contributes an effective and efficient solution to it. |
Alex Bogatu; Alvaro A. A. Fernandes; Norman W. Paton; Nikolaos Konstantinou; |
2020 | 8 | Are We Ready For Learned Cardinality Estimation? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we ask a forward-thinking question: Are we ready to deploy these learned cardinality models in production? |
Xiaoying Wang; Changbo Qu; Weiyuan Wu; Jiannan Wang; Qingqing Zhou; |
2020 | 9 | ResilientDB: Global Scale Resilient Blockchain Fabric IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we present the Geo-Scale Byzantine FaultTolerant consensus protocol (GeoBFT). |
Suyash Gupta; Sajjad Rahnama; Jelle Hellings; Mohammad Sadoghi; |
2020 | 10 | NeuroCard: One Cardinality Estimator For All Tables IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that it is possible to learn the correlations across all tables in a database without any independence assumptions. |
ZONGHENG YANG et. al. |
2020 | 11 | Discovering High Utility-Occupancy Patterns From Uncertain Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel algorithm, called High-Utility-Occupancy Pattern Mining in Uncertain databases (UHUOPM), is proposed. |
Chien-Ming Chen; Lili Chen; Wensheng Gan; Lina Qiu; Weiping Ding; |
2020 | 12 | Neural Networks for Entity Matching: A Survey IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we present how neural networks have been used for entity matching. |
Nils Barlaug; Jon Atle Gulla; |
2020 | 13 | Privacy Preserving Distributed Machine Learning with Federated Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses these issues by proposing a distributed perturbation algorithm named as DISTPAB, for privacy preservation of horizontally partitioned data. |
M. A. P. Chamikara; P. Bertok; I. Khalil; D. Liu; S. Camtepe; |
2020 | 14 | Towards Scalable Dataframe Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we lay out a vision and roadmap for scalable dataframe systems. |
DEVIN PETERSOHN et. al. |
2020 | 15 | Qd-tree: Learning Data Layouts For Big Data Analytics IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework called a query-data routing tree, or qd-tree, to address this problem, and propose two algorithms for their construction based on greedy and deep reinforcement learning techniques. |
ZONGHENG YANG et. al. |
2020 | 16 | Testing Database Engines Via Pivoted Query Synthesis IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we devised a novel and general approach that we have termed Pivoted Query Synthesis. |
Manuel Rigger; Zhendong Su; |
2020 | 17 | Return Of The Lernaean Hydra: Experimental Evaluation Of Data Series Approximate Similarity Search IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a taxonomy of similarity search techniques that reconciles the terminology used in these two domains, we describe modifications to data series indexing techniques enabling them to answer approximate similarity queries with quality guarantees, and we conduct a thorough experimental evaluation to compare approximate similarity search techniques under a unified framework, on synthetic and real datasets in memory and on disk. |
Karima Echihabi; Kostas Zoumpatianos; Themis Palpanas; Houda Benbrahim; |
2020 | 18 | SDM-RDFizer: An RML Interpreter For The Efficient Creation Of RDF Knowledge Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the SDM-RDFizer, an interpreter of the RDF Mapping Language (RML), to transform raw data in various formats into an RDF knowledge graph. |
Enrique Iglesias; Samaneh Jozashoori; David Chaves-Fraga; Diego Collarana; Maria-Esther Vidal; |
2020 | 19 | Cost Models For Big Data Query Processing: Learning, Retrofitting, And Our Findings IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate two key questions: (i) can we learn accurate cost models for big data systems, and (ii) can we integrate the learned models within the query optimizer. |
Tarique Siddiqui; Alekh Jindal; Shi Qiao; Hiren Patel; Wangchao le; |
2020 | 20 | On The Nature and Types of Anomalies: A Review of Deviations in Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies and presents a full overview of anomaly types and subtypes. |
Ralph Foorthuis; |
2020 | 21 | The Lernaean Hydra Of Data Series Similarity Search: An Experimental Evaluation Of The State Of The Art IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide definitions for the different flavors of similarity search that have been studied in the past, and present the first systematic experimental evaluation of the efficiency of data series similarity search techniques. |
Karima Echihabi; Kostas Zoumpatianos; Themis Palpanas; Houda Benbrahim; |
2020 | 22 | Efficient Bitruss Decomposition For Large-scale Bipartite Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the bitruss decomposition problem which aims to find all the k-bitrusses for k >= 0. |
Kai Wang; Xuemin Lin; Lu Qin; Wenjie Zhang; Ying Zhang; |
2020 | 23 | FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in model size and accurate in estimation quality. |
RONG ZHU et. al. |
2020 | 24 | Dash: Scalable Hashing On Persistent Memory IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Dash, a holistic approach to building dynamic and scalable hash tables on real PM hardware with all the aforementioned properties. |
Baotong Lu; Xiangpeng Hao; Tianzheng Wang; Eric Lo; |
2020 | 25 | Multi-Dimensional Event Data in Graph Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a general data model for multi-dimensional event data based on labeled property graphs that allows storing structural and temporal relations in a single, integrated graph-based data structure in a systematic way. The queries allow for efficiently converting large real-life event data sets into our data model and we provide 5 converted data sets for further research. |
Stefan Esser; Dirk Fahland; |
2020 | 26 | TODS: An Automated Time Series Outlier Detection System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present TODS, an automated Time Series Outlier Detection System for research and industrial applications. |
KWEI-HERNG LAI et. al. |
2020 | 27 | Constant-Delay Enumeration For Nondeterministic Document Spanners IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Several recent works at PODS’18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. |
Antoine Amarilli; Pierre Bourhis; Stefan Mengel; Matthias Niewerth; |
2020 | 28 | Valentine: Evaluating Matching Techniques for Dataset Discovery IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. |
CHRISTOS KOUTRAS et. al. |
2020 | 29 | Efficient And Effective Community Search On Large-scale Bipartite Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the significant (alpha, beta)-community search problem on weighted bipartite graphs. |
KAI WANG et. al. |
2020 | 30 | Data Market Platforms: Trading Data Assets To Solve Data Problems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose data market platforms to address the lack of information and incentives and tackle the problems of data sharing, discovery, and integration. |
Raul Castro Fernandez; Pranav Subramaniam; Michael J. Franklin; |
2019 | 1 | Neo: A Learned Query Optimizer IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learning-based query optimizer that relies on deep neural networks to generate query executions plans. |
RYAN MARCUS et. al. |
2019 | 2 | Approximate Queries And Representations For Large Data Sequences IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an algorithm for realizing our technique, and the results of applying it to medical cardiology data. |
Hagit Shatkay; Stanley B. Zdonik; |
2019 | 3 | ALEX: An Updatable Adaptive Learned Index IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. |
JIALIN DING et. al. |
2019 | 4 | A Survey Of Community Search Over Big Graphs IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we conduct a thorough review of existing community search works. |
YIXIANG FANG et. al. |
2019 | 5 | Deep Unsupervised Cardinality Estimation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep autoregressive models. |
ZONGHENG YANG et. al. |
2019 | 6 | An End-to-End Learning-based Cost Estimator IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we propose an effective end-to-end learning-based cost estimation framework based on a tree-structured model, which can estimate both cost and cardinality simultaneously. |
Ji Sun; Guoliang Li; |
2019 | 7 | Learning Multi-dimensional Indexes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Flood, a multi-dimensional in-memory index that automatically adapts itself to a particular dataset and workload by jointly optimizing the index structure and data storage. |
Vikram Nathan; Jialin Ding; Mohammad Alizadeh; Tim Kraska; |
2019 | 8 | Dataset Search: A Survey IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we survey the state of the art of research and commercial systems in dataset retrieval. |
ADRIANE CHAPMAN et. al. |
2019 | 9 | SharPer: Sharding Permissioned Blockchains Over Network Clusters IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SharPer, a permissioned blockchain system that improves scalability by clustering (partitioning) the nodes and assigning different data shards to different clusters where each data shard is replicated on the nodes of a cluster. |
Mohammad Javad Amiri; Divyakant Agrawal; Amr El Abbadi; |
2019 | 10 | Database Meets Deep Learning: Challenges And Opportunities IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss research problems at the intersection of the two fields. |
WEI WANG et. al. |
2019 | 11 | A Comparative Survey Of Recent Natural Language Interfaces For Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we give an overview over 24 recently developed NLIs for databases. |
Katrin Affolter; Kurt Stockinger; Abraham Bernstein; |
2019 | 12 | Low-resource Deep Entity Resolution With Transfer And Active Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a deep learning-based method that targets low-resource settings for ER through a novel combination of transfer learning and active learning. |
Jungo Kasai; Kun Qian; Sairam Gurajada; Yunyao Li; Lucian Popa; |
2019 | 13 | Plan-Structured Deep Neural Network Models For Query Performance Prediction IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we argue that deep learning can be applied to the query performance prediction problem, and we introduce a novel neural network architecture for the task: a plan-structured neural network. |
Ryan Marcus; Olga Papaemmanouil; |
2019 | 14 | Optimizing Subgraph Queries By Combining Binary And Worst-Case Optimal Joins IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of optimizing subgraph queries using the new worst-case optimal join plans. |
Amine Mhedhbi; Semih Salihoglu; |
2019 | 15 | HoloDetect: Few-Shot Learning For Error Detection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a few-shot learning framework for error detection. |
Alireza Heidari; Joshua McGrath; Ihab F. Ilyas; Theodoros Rekatsinas; |
2019 | 16 | CityJSON: A Compact And Easy-to-use Encoding Of The CityGML Data Model IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present CityJSON, a new JSON-based exchange format for the CityGML data model (version 2.0.0). |
HUGO LEDOUX et. al. |
2019 | 17 | SkinnerDB: Regret-Bounded Query Evaluation Via Reinforcement Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Along with SkinnerDB, we introduce a new quality criterion for query execution strategies. |
IMMANUEL TRUMMER et. al. |
2019 | 18 | Atomic Commitment Across Blockchains IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present AC3WN, the first decentralized all-or-nothing atomic cross-chain commitment protocol. |
Victor Zakhary; Divyakant Agrawal; Amr El Abbadi; |
2019 | 19 | Fair Decision Making Using Privacy-Protected Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose novel measures of fairness in the context of randomized differentially private algorithms and identify a range of causes of outcome disparities. |
SATYA KUPPAM et. al. |
2019 | 20 | Efficient Algorithms For Densest Subgraph Discovery IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Because DSD is difficult to solve, we propose a new solution paradigm in this paper. |
Yixiang Fang; Kaiqiang Yu; Reynold Cheng; Laks V. S. Lakshmanan; Xuemin Lin; |
2019 | 21 | Persistent Memory I/O Primitives IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide one of the first performance evaluations of PMem in terms of bandwidth and latency. |
Alexander van Renen; Lukas Vogel; Viktor Leis; Thomas Neumann; Alfons Kemper; |
2019 | 22 | A Survey Of Data Quality Measurement And Monitoring Tools IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we close the gap between research into data quality measurement and practical implementations by investigating the functional scope of current data quality tools. |
Lisa Ehrlinger; Elisa Rusz; Wolfram Wöß; |
2019 | 23 | ProUM: Projection-based Utility Mining On Sequence Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient Projection-based Utility Mining (ProUM) approach to discover high-utility sequential patterns from sequence data. |
WENSHENG GAN et. al. |
2019 | 24 | Mining Closed Strict Episodes IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we introduce a technique for discovering closed episodes. |
Nikolaj Tatti; Boris Cule; |
2019 | 25 | A Hybrid Approach To Hierarchical Density-based Cluster Selection IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show how the application of an additional threshold value can result in a combination of DBSCAN* and HDBSCAN clusters, and demonstrate potential benefits of this hybrid approach when clustering data of variable densities. |
Claudia Malzer; Marcus Baum; |
2019 | 26 | ZeroER: Entity Resolution Using Zero Labeled Examples IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we answer in the affirmative through our proposed approach dubbed ZeroER. |
Renzhi Wu; Sanya Chaba; Saurabh Sawlani; Xu Chu; Saravanan Thirumuruganathan; |
2019 | 27 | SOSD: A Benchmark For Learned Indexes IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To answer this question, we propose a new benchmarking framework that comes with a variety of real-world datasets and baseline implementations to compare against. |
ANDREAS KIPF et. al. |
2019 | 28 | TigerGraph: A Native MPP Graph Database IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present TigerGraph, a graph database system built from the ground up to support massively parallel computation of queries and analytics. |
Alin Deutsch; Yu Xu; Mingxi Wu; Victor Lee; |
2019 | 29 | Efficient Privacy Preservation Of Big Data For Accurate Data Mining IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses these issues by proposing an efficient and scalable nonreversible perturbation algorithm, PABIDOT, for privacy preservation of big data via optimal geometric transformations. |
M. A. P. Chamikara; P. Bertok; D. Liu; S. Camtepe; I. Khalil; |
2019 | 30 | Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the sheer size of such datasets, combined with the irregular nature of graph processing, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. |
MACIEJ BESTA et. al. |
2018 | 1 | Datasheets for Datasets IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this gap, we propose datasheets for datasets. |
TIMNIT GEBRU et. al. |
2018 | 2 | Data Synthesis Based On Generative Adversarial Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method that meets both requirements. |
NOSEONG PARK et. al. |
2018 | 3 | Learned Cardinalities: Estimating Correlated Joins With Deep Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe a new deep learning approach to cardinality estimation. |
ANDREAS KIPF et. al. |
2018 | 4 | The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We discuss ways to move forward given the limitations identified. |
Sarah Holland; Ahmed Hosny; Sarah Newman; Joshua Joseph; Kasia Chmielinski; |
2018 | 5 | Focus: Querying Large Video Datasets With Low Latency And Low Cost IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build Focus, a system for low-latency and low-cost querying on large video datasets. |
KEVIN HSIEH et. al. |
2018 | 6 | A Survey Of Parallel Sequential Pattern Mining IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an in-depth survey of the current status of parallel sequential pattern mining (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state of the art parallel SPM. |
Wensheng Gan; Jerry Chun-Wei Lin; Philippe Fournier-Viger; Han-Chieh Chao; Philip S. Yu; |
2018 | 7 | A Survey Of Utility-Oriented Pattern Mining IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For identifying and evaluating the usefulness of different kinds of patterns, many techniques and constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g., weight, price, profit, quantity, satisfaction, etc.). |
WENSHENG GAN et. al. |
2018 | 8 | Learning To Optimize Join Queries With Deep Reinforcement Learning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recognizing the link between classical Dynamic Programming enumeration methods and recent results in Reinforcement Learning (RL), we propose a new method for learning optimized join search strategies. |
Sanjay Krishnan; Zongheng Yang; Ken Goldberg; Joseph Hellerstein; Ion Stoica; |
2018 | 9 | Deep Reinforcement Learning For Join Order Enumeration IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that existing deep reinforcement learning techniques can be applied to address this challenge. |
Ryan Marcus; Olga Papaemmanouil; |
2018 | 10 | FITing-Tree: A Data-aware Index Structure IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present FITing-Tree, a novel form of a learned index which uses piece-wise linear functions with a bounded error specified at construction time. |
Alex Galakatos; Michael Markovitch; Carsten Binnig; Rodrigo Fonseca; Tim Kraska; |
2018 | 11 | LSM-based Storage Techniques: A Survey IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a survey of recent research efforts on LSM-trees so that readers can learn the state-of-the-art in LSM-based storage techniques. |
Chen Luo; Michael J. Carey; |
2018 | 12 | Benchmarking Distributed Stream Data Processing Systems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a framework for benchmarking distributed stream processing engines. |
JEYHUN KARIMOV et. al. |
2018 | 13 | VChain: Enabling Verifiable Boolean Range Queries Over Blockchain Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take the first step toward investigating the problem of verifiable query processing over blockchain databases. |
Cheng Xu; Ce Zhang; Jianliang Xu; |
2018 | 14 | Learning State Representations For Query Optimization With Deep Reinforcement Learning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal in this work is to explore the capabilities of deep reinforcement learning in the context of query optimization. |
Jennifer Ortiz; Magdalena Balazinska; Johannes Gehrke; S. Sathiya Keerthi; |
2018 | 15 | Apache Calcite: A Foundational Framework For Optimized Query Processing Over Heterogeneous Data Sources IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems … |
Edmon Begoli; Jesús Camacho Rodríguez; Julian Hyde; Michael J. Mior; Daniel Lemire; |
2018 | 16 | The Vadalog System: Datalog-based Reasoning For Knowledge Graphs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present the Vadalog system, a Datalog-based system for performing complex logic reasoning tasks, such as those required in advanced knowledge graphs. |
Luigi Bellomarini; Georg Gottlob; Emanuel Sallinger; |
2018 | 17 | BlazeIt: Optimizing Declarative Aggregation And Limit Queries For Neural Network-Based Video Analytics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce two new query optimization techniques in BlazeIt that are not supported by prior work. |
Daniel Kang; Peter Bailis; Matei Zaharia; |
2018 | 18 | VerdictDB: Universalizing Approximate Query Processing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, we argue that a universal solution is needed: a database-agnostic approximation engine that will widen the reach of this emerging technology across various platforms. |
Yongjoo Park; Barzan Mozafari; Joseph Sorenson; Junhao Wang; |
2018 | 19 | Achieving Data Truthfulness And Privacy Preservation In Data Markets IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TPDM, which efficiently integrates data Truthfulness and Privacy preservation in Data Markets. |
Chaoyue Niu; Zhenzhe Zheng; Fan Wu; Xiaofeng Gao; Guihai Chen; |
2018 | 20 | ForkBase: An Efficient Storage Engine For Blockchain And Forkable Applications IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present ForkBase, a storage engine specifically designed to provide efficient support for blockchain and forkable applications. |
SHENG WANG et. al. |
2018 | 21 | Optimizing Error Of High-dimensional Statistical Queries Under Differential Privacy IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose HDMM, a new differentially private algorithm for answering a workload of predicate counting queries, that is especially effective for higher-dimensional datasets. |
Ryan McKenna; Gerome Miklau; Michael Hay; Ashwin Machanavajjhala; |
2018 | 22 | Rafiki: Machine Learning As An Analytics Service System IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop and present a system, called Rafiki, to provide the training and inference service of machine learning models, and facilitate complex analytics on top of cloud platforms. |
WEI WANG et. al. |
2018 | 23 | Accelerating Human-in-the-loop Machine Learning: Challenges And Opportunities IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe our vision for a human-in-the-loop ML system that accelerates this process: by intelligently tracking changes and intermediate results over time, such a system can enable rapid iteration, quick responsive feedback, introspection and debugging, and background execution and automation. |
DORIS XIN et. al. |
2018 | 24 | Answering Range Queries Under Local Differential Privacy IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce and analyze methods to support range queries under the local variant of differential privacy, an emerging standard for privacy-preserving data analysis. |
Tejas Kulkarni; Graham Cormode; Divesh Srivastava; |
2018 | 25 | TaxoGen: Unsupervised Topic Taxonomy Construction By Adaptive Term Embedding And Clustering IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. |
CHAO ZHANG et. al. |
2018 | 26 | Model-based Pricing For Machine Learning In A Data Marketplace IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a model-based pricing (MBP) framework, which instead of pricing the data, directly prices ML model instances. |
Lingjiao Chen; Paraschos Koutris; Arun Kumar; |
2018 | 27 | Assessing And Remedying Coverage For A Given Dataset IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we assess the coverage of a given dataset over multiple categorical attributes. |
Abolfazl Asudeh; Zhongjun Jin; H. V. Jagadish; |
2018 | 28 | Entity Resolution And Federated Learning Get A Federated Resolution IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a thorough answer to this question, answering how optimal classifiers, empirical losses, margins and generalisation abilities are affected. |
RICHARD NOCK et. al. |
2018 | 29 | Utility-Optimized Local Differential Privacy Mechanisms For Distribution Estimation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the notion of ULDP (Utility-optimized LDP), which provides a privacy guarantee equivalent to LDP only for sensitive data. |
Takao Murakami; Yusuke Kawamoto; |
2018 | 30 | Wormhole: A Fast Ordered Index For In-memory Data Management IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we introduce a new ordered index structure, named Wormhole, that takes O(log L) worst-case time for looking up a key with a length of L. |
Xingbo Wu; Fan Ni; Song Jiang; |
2017 | 1 | The Case For Learned Index Structures IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. |
Tim Kraska; Alex Beutel; Ed H. Chi; Jeffrey Dean; Neoklis Polyzotis; |
2017 | 2 | Untangling Blockchain: A Data Processing View Of Blockchain Systems IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We analyze both in-production and research systems in four dimensions: distributed ledger, cryptography, consensus protocol and smart contract. |
TIEN TUAN ANH DINH et. al. |
2017 | 3 | BLOCKBENCH: A Framework For Analyzing Private Blockchains IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper concerns recent private blockchain systems designed with stronger security (trust) assumption and performance requirement. |
TIEN TUAN ANH DINH et. al. |
2017 | 4 | HoloClean: Holistic Data Repairs With Probabilistic Inference IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce HoloClean, a framework for holistic data repairing driven by probabilistic inference. |
Theodoros Rekatsinas; Xu Chu; Ihab F. Ilyas; Christopher Ré; |
2017 | 5 | Size Bounds And Query Plans For Relational Joins IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study these problems from a theoretical perspective, both in the worst-case model, and in an average-case model where the database is chosen according to a known probability distribution. |
Albert Atserias; Martin Grohe; Dániel Marx; |
2017 | 6 | NoScope: Optimizing Neural Network Queries Over Video At Scale IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we present NoScope, a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search. |
Daniel Kang; John Emmons; Firas Abuzaid; Peter Bailis; Matei Zaharia; |
2017 | 7 | An Analytical Study Of Large SPARQL Query Logs IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date query logs from a wide variety of RDF data sources. |
Angela Bonifati; Wim Martens; Thomas Timm; |
2017 | 8 | G-CORE: A Core For Future Graph Query Languages IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We report on a community effort between industry and academia to shape the future of graph query languages. |
RENZO ANGLES et. al. |
2017 | 9 | Designing Fair Ranking Schemes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a system that helps users choose criterion weights that lead to greater fairness. |
Abolfazl Asudeh; H. V. Jagadish; Julia Stoyanovich; Gautam Das; |
2017 | 10 | Time Series Management Systems: A Survey IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a thorough analysis and classification of TSMSs developed through academic or industrial research and documented through publications. |
Søren Kejser Jensen; Torben Bach Pedersen; Christian Thomsen; |
2017 | 11 | Enabling Smart Data: Noise Filtering In Big Data Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, two Big Data preprocessing approaches to remove noisy examples are proposed: an homogeneous ensemble and an heterogeneous ensemble filter, with special emphasis in their scalability and performance traits. |
Diego García-Gil; Julián Luengo; Salvador García; Francisco Herrera; |
2017 | 12 | Marginal Release Under Local Differential Privacy IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. |
Tejas Kulkarni; Graham Cormode; Divesh Srivastava; |
2017 | 13 | Foresight: Recommending Visual Insights IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Foresight, a system that helps the user rapidly discover visual insights from large high-dimensional datasets. |
Çağatay Demiralp; Peter J. Haas; Srinivasan Parthasarathy; Tejaswini Pedapati; |
2017 | 14 | Big Data: Challenges, Opportunities And Realities IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This chapter presents an overview of big data analytics, its application, advantages, and limitations. |
Abhay Bhadani; Dhanya Jothimani; |
2017 | 15 | Answering Conjunctive Queries Under Updates IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the task of enumerating and counting answers to $k$-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. |
Christoph Berkholz; Jens Keppeler; Nicole Schweikardt; |
2017 | 16 | JSON: Data Model, Query Languages And Schema Specification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There- fore in this paper we propose a formal data model for JSON documents and, based on the common features present in available systems using JSON, we define a lightweight query language allowing us to navigate through JSON documents. |
Pierre Bourhis; Juan L. Reutter; Fernando Suárez; Domagoj Vrgoč; |
2017 | 17 | Composing Differential Privacy And Secure Computation: A Case Study On Scaling Private Record Linkage IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of this deficiency, we propose a novel privacy model, called output constrained differential privacy, that shares the strong privacy protection of DP, but allows for the truthful release of the output of a certain function applied to the data. |
Xi He; Ashwin Machanavajjhala; Cheryl Flynn; Divesh Srivastava; |
2017 | 18 | Fonduer: Knowledge Base Construction From Richly Formatted Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Fonduer, a machine-learning-based KBC system for richly formatted data. |
SEN WU et. al. |
2017 | 19 | The Ubiquity Of Large Graphs And Surprising Challenges Of Graph Processing: Extended Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe the participants’ responses to our questions highlighting common patterns and challenges. |
Siddhartha Sahu; Amine Mhedhbi; Semih Salihoglu; Jimmy Lin; M. Tamer Özsu; |
2017 | 20 | Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science As A Service IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, we have been witnessing huge advancements in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to exploit modern technologies to process, analyze and understand this data. |
Radwa Elshawi; Sherif Sakr; |
2017 | 21 | One Button Machine For Automating Feature Engineering In Relational Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a system called One Button Machine, or OneBM for short, which automates feature discovery in relational databases. |
HOANG THANH LAM et. al. |
2017 | 22 | Quantifying Differential Privacy In Continuous Data Release Under Temporal Correlations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the potential privacy loss of a traditional DP mechanism under temporal correlations. Third, we propose data releasing mechanisms that convert any existing DP mechanism into one against TPL. |
Yang Cao; Masatoshi Yoshikawa; Yonghui Xiao; Li Xiong; |
2017 | 23 | BoostClean: Automated Error Detection And Repair For Machine Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present BoostClean which automatically selects an ensemble of error detection and repair combinations using statistical boosting. |
Sanjay Krishnan; Michael J. Franklin; Ken Goldberg; Eugene Wu; |
2017 | 24 | Comparing Dataset Characteristics That Favor The Apriori, Eclat Or FP-Growth Frequent Itemset Mining Algorithms IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. |
Jeff Heaton; |
2017 | 25 | A Survey Of State Management In Big Data Processing Systems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the pivotal role that state management plays in various use cases, in this survey, we present some of the most important uses of state as an enabler, discuss the alternative approaches used to handle and implement state, propose a taxonomy to capture the many facets of state management, and highlight new research directions. |
Quoc-Cuong To; Juan Soto; Volker Markl; |
2017 | 26 | Database Learning: Toward A Database That Becomes Smarter Every Time IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. |
Yongjoo Park; Ahmad Shahab Tajik; Michael Cafarella; Barzan Mozafari; |
2017 | 27 | Discovering More Precise Process Models From Event Logs By Filtering Out Chaotic Activities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the presence of such chaotic activities in an event log heavily impacts the quality of the process models that can be discovered with process discovery techniques. |
Niek Tax; Natalia Sidorova; Wil M. P. van der Aalst; |
2017 | 28 | Ease.ml: Towards Multi-tenant Resource Sharing For Machine Learning Workloads IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe the ease.ml architecture and focus on a novel technical problem introduced by ease.ml regarding resource allocation. |
Tian Li; Jie Zhong; Ji Liu; Wentao Wu; Ce Zhang; |
2017 | 29 | A Survey On Geographically Distributed Big-Data Processing Using MapReduce IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate and discuss challenges and requirements in designing geographically distributed data processing frameworks and protocols. |
Shlomi Dolev; Patricia Florissi; Ehud Gudes; Shantanu Sharma; Ido Singer; |
2017 | 30 | Event Stream-Based Process Discovery Using Abstract Representations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we focus on process discovery relying on online streams of business process execution events. |
Sebastiaan J. van Zelst; Boudewijn F. van Dongen; Wil M. P. van der Aalst; |
2016 | 1 | Foundations Of Modern Query Languages For Graph Databases IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We survey foundational features underlying modern graph query languages. |
RENZO ANGLES et. al. |
2016 | 2 | Measuring Fairness In Ranked Outputs IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose fairness measures for ranked outputs. |
Ke Yang; Julia Stoyanovich; |
2016 | 3 | Collecting And Analyzing Data From Smart Device Users With Local Differential Privacy IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by this, we propose Harmony, a practical, accurate and efficient system for collecting and analyzing data from smart device users, while satisfying LDP. |
THÔNG T. NGUYÊN et. al. |
2016 | 4 | PrivTree: A Differentially Private Algorithm For Hierarchical Decompositions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To remedy the deficiency of existing solutions, we present PrivTree, a histogram construction algorithm that also applies hierarchical decomposition but features a crucial (and somewhat surprising) improvement: when deciding whether or not to split a sub-domain, the amount of noise required in the corresponding tuple count is independent of the recursive depth. |
Jun Zhang; Xiaokui Xiao; Xing Xie; |
2016 | 5 | Building Efficient Query Engines In A High-Level Language IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we realize this vision in the domain of analytical query processing. |
Amir Shaikhha; Yannis Klonatos; Christoph Koch; |
2016 | 6 | Effortless Data Exploration With Zenvisage: An Expressive And Interactive Visual Analytics System IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose zenvisage, a platform for effortlessly visualizing interesting patterns, trends, or insights from large datasets. |
Tarique Siddiqui; Albert Kim; John Lee; Karrie Karahalios; Aditya Parameswaran; |
2016 | 7 | LSH Ensemble: Internet-Scale Domain Search IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new index structure, Locality Sensitive Hashing (LSH) Ensemble, that solves the domain search problem using set containment at Internet scale. |
Erkang Zhu; Fatemeh Nargesian; Ken Q. Pu; Renée J. Miller; |
2016 | 8 | MacroBase: Prioritizing Attention In Fast Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. |
PETER BAILIS et. al. |
2016 | 10 | Predicting Completeness In Knowledge Bases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate different signals to identify the areas where a knowledge base is complete. |
Luis Galárraga; Simon Razniewski; Antoine Amarilli; Fabian M. Suchanek; |
2016 | 11 | Mining Local Process Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we describe a method to discover frequent behavioral patterns in event logs. |
Niek Tax; Natalia Sidorova; Reinder Haakma; Wil M. P. van der Aalst; |
2016 | 12 | Quantifying Differential Privacy Under Temporal Correlations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the potential privacy loss of a traditional DP mechanism under temporal correlations in the context of continuous data release. |
Yang Cao; Masatoshi Yoshikawa; Yonghui Xiao; Li Xiong; |
2016 | 13 | The BigDAWG Polystore System And Architecture IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we will describe polystore databases, the current BigDAWG architecture and its application on the MIMIC II medical dataset, initial performance results and our future development plans. |
VIJAY GADEPALLY et. al. |
2016 | 14 | A Survey Of RDF Data Management Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we provide an overview of these works. |
M. Tamer Özsu; |
2016 | 15 | An Automatic Identification System (AIS) Database For Maritime Trajectory Prediction And Data Mining IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper is devoted to construct a standard AIS database for maritime trajectory learning, prediction and data mining. |
SHANGBO MAO et. al. |
2016 | 16 | A Fast Order-Based Approach For Core Maintenance IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new order-based approach to maintain an order, called k-order, among vertices, while a graph is updated. |
Yikai Zhang; Jeffrey Xu Yu; Ying Zhang; Lu Qin; |
2016 | 17 | Towards Linear Algebra Over Normalized Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that it is possible to mitigate this overhead by leveraging a popular formal algebra to represent the computations of many ML algorithms: linear algebra. |
Lingjiao Chen; Arun Kumar; Jeffrey Naughton; Jignesh M. Patel; |
2016 | 18 | Data Mining : Past Present And Future – A Typical Survey On Data Streams IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we give the algorithm for finding frequent patterns from data streams with a case study and identify the research issues in handling data streams. |
M. S. B. PhridviRaja; C. V. GuruRao; |
2016 | 19 | Security And Privacy Aspects In MapReduce On Clouds: A Survey IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate and discuss security and privacy challenges and requirements, considering a variety of adversarial capabilities, and characteristics in the scope of MapReduce. |
Philip Derbeko; Shlomi Dolev; Ehud Gudes; Shantanu Sharma; |
2016 | 20 | What Do Shannon-type Inequalities, Submodular Width, And Disjunctive Datalog Have To Do With One Another? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent works on bounding the output size of a conjunctive query with functional dependencies and degree constraints have shown a deep connection between fundamental questions in information theory and database theory. |
Mahmoud Abo Khamis; Hung Q. Ngo; Dan Suciu; |
2016 | 21 | SMCQL: Secure Querying For Federated Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a framework for executing PDN queries named SMCQL. |
JOHES BATER et. al. |
2016 | 22 | Controlling False Discoveries During Interactive Data Exploration IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose solutions to integrate multiple hypothesis testing control into interactive data exploration tools. |
ZHEGUANG ZHAO et. al. |
2016 | 23 | Decision Tree Classification With Differential Privacy: A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we focus on one particular data mining algorithm — decision trees — and how differential privacy interacts with each of the components that constitute decision tree algorithms. |
Sam Fletcher; Md Zahidul Islam; |
2016 | 24 | Sampling-Based Query Re-Optimization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a low-cost post-processing step that can take a plan produced by the optimizer, detect when it is likely to have made such a mistake, and take steps to fix it. |
Wentao Wu; Jeffrey F. Naughton; Harneet Singh; |
2016 | 25 | Data Polygamy: The Many-Many Relationships Among Urban Spatio-Temporal Data Sets IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address these challenges, we propose Data Polygamy, a scalable topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets. |
Fernando Chirigati; Harish Doraiswamy; Theodoros Damoulas; Juliana Freire; |
2016 | 26 | Worst-Case Optimal Algorithms For Parallel Query Processing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. |
Paul Beame; Paraschos Koutris; Dan Suciu; |
2016 | 27 | RECOME: A New Density-Based Clustering Algorithm Using Relative KNN Kernel Density IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the RElative COre MErge (RECOME) clustering algorithm. |
YANGLI-AO GENG et. al. |
2016 | 28 | Consistently Faster And Smaller Compressed Bitmaps With Roaring IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). |
Daniel Lemire; Gregory Ssi-Yan-Kai; Owen Kaser; |
2016 | 29 | Effective And Complete Discovery Of Order Dependencies Via Set-based Axiomatization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We improve significantly on complexity, offer completeness, and define a compact canonical form. |
Jaroslaw Szlichta; Parke Godfrey; Lukasz Golab; Mehdi Kargar; Divesh Srivastava; |
2016 | 30 | Top-k Spatial-keyword Publish/Subscribe Over Sliding Window IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate a novel real-time top-k monitoring problem over sliding window of streaming data; that is, we continuously maintain the top-k most relevant geo-textual messages (e.g., geo-tagged tweets) for a large number of spatial-keyword subscriptions (e.g., registered users interested in local events) simultaneously. |
Xiang Wang; Ying Zhang; Wenjie Zhang; Xuemin Lin; Zengfeng Huang; |
2015 | 1 | Converting Static Image Datasets To Spiking Neuromorphic Datasets Using Saccades IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose a method for converting existing Computer Vision static image datasets into Neuromorphic Vision datasets using an actuated pan-tilt camera platform. |
Garrick Orchard; Ajinkya Jayawant; Gregory Cohen; Nitish Thakor; |
2015 | 2 | A Survey On Truth Discovery IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we focus on providing a comprehensive overview of truth discovery methods, and summarizing them from different aspects. |
YALIANG LI et. al. |
2015 | 3 | Truth Finding On The Deep Web: Is The Problem Solved? IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study truthfulness of Deep Web data in two domains where we believed data are fairly clean and data quality is important to people’s lives: {\em Stock} and {\em Flight}. |
Xian Li; Xin Luna Dong; Kenneth Lyons; Weiyi Meng; Divesh Srivastava; |
2015 | 4 | Incremental Knowledge Base Construction Using DeepDive IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and we present techniques to make the KBC process more efficient. |
JAEHO SHIN et. al. |
2015 | 5 | Big Data Analytics For Dynamic Energy Management In Smart Grids IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research aims to highlight the big data issues and challenges faced by the DEM employed in SG networks. |
Panagiotis D. Diamantoulakis; Vasileios M. Kapinas; George K. Karagiannidis; |
2015 | 6 | EmptyHeaded: A Relational Engine For Graph Processing IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. |
Christopher R. Aberger; Susan Tu; Kunle Olukotun; Christopher Ré; |
2015 | 7 | From Data Fusion To Knowledge Fusion IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the applicability and limitations of different fusion techniques on a more challenging problem: {\em knowledge fusion}. |
XIN LUNA DONG et. al. |
2015 | 8 | Knowledge-Based Trust: Estimating The Trustworthiness Of Web Sources IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. |
XIN LUNA DONG et. al. |
2015 | 9 | S2RDF: RDF Querying With SPARQL On Spark IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we describe a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational databases, to efficiently minimize query input size regardless of its pattern shape and diameter. |
Alexander Schätzle; Martin Przyjaciel-Zablocki; Simon Skilevic; Georg Lausen; |
2015 | 10 | The End Of Slow Networks: It’s Time For A Redesign IF:4 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Next generation high-performance RDMA-capable networks will require a fundamental rethinking of the design and architecture of modern distributed DBMSs. These systems are commonly … |
Carsten Binnig; Andrew Crotty; Alex Galakatos; Tim Kraska; Erfan Zamanian; |
2015 | 11 | FAQ: Questions Asked Frequently IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main technical contribution of this work is a precise characterization of when a variable ordering is ‘semantically equivalent’ to the variable ordering given by the input FAQ expression. |
Mahmoud Abo Khamis; Hung Q. Ngo; Atri Rudra; |
2015 | 12 | Discriminative Predicate Path Mining For Fact Checking In Knowledge Graphs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We view this problem as a link-prediction task in a knowledge graph, and present a discriminative path-based method for fact checking in knowledge graphs that incorporates connectivity, type information, and predicate interactions. |
Baoxu Shi; Tim Weninger; |
2015 | 13 | Task Assignment On Multi-Skill Oriented Spatial Crowdsourcing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a spatial crowdsourcing scenario, in which each worker has a set of qualified skills, whereas each spatial task (e.g., repairing a house, decorating a room, and performing entertainment shows for a ceremony) is time-constrained, under the budget constraint, and required a set of skills. |
Peng Cheng; Xiang Lian; Lei Chen; Jinsong Han; Jizhong Zhao; |
2015 | 14 | Principled Evaluation Of Differentially Private Algorithms Using DPBench IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose a set of evaluation principles which we argue are essential for sound evaluation. |
Michael Hay; Ashwin Machanavajjhala; Gerome Miklau; Yan Chen; Dan Zhang; |
2015 | 15 | Km4City Ontology Building Vs Data Harvesting And Cleaning For Smart-city Services IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a system for data ingestion and reconciliation of smart cities related aspects as road graph, services available on the roads, traffic sensors etc., is proposed. |
Pierfrancesco Bellini; Monica Benigni; Riccardo Billero; Paolo Nesi; Nadia Rauch; |
2015 | 16 | Fusing Data With Correlations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present novel techniques modeling correlations between sources and applying it in truth finding. |
Ravali Pochampally; Anish Das Sarma; Xin Luna Dong; Alexandra Meliou; Divesh Srivastava; |
2015 | 17 | Visualization-Aware Sampling For Very Large Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a visualization-aware sampling (VAS) that guarantees high quality visualizations with a small subset of the entire dataset. |
Yongjoo Park; Michael Cafarella; Barzan Mozafari; |
2015 | 18 | GMark: Schema-Driven Generation Of Graphs And Queries IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the design and engineering principles of gMark, a domain- and query language-independent graph instance and query workload generator. |
GUILLAUME BAGAN et. al. |
2015 | 19 | The Gremlin Graph Traversal Machine And Language IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article provides a mathematical description of Gremlin and details its automaton and functional properties. |
Marko A. Rodriguez; |
2015 | 20 | I/O Efficient Core Graph Decomposition At Web Scale IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study I/O efficient core decomposition following a semi-external model, which only allows node information to be loaded in memory. |
Dong Wen; Lu Qin; Ying Zhang; Xuemin Lin; Jeffrey Xu Yu; |
2015 | 21 | S-Store: Streaming Meets Transaction Processing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. |
JOHN MEEHAN et. al. |
2015 | 22 | High-Speed Query Processing Over High-Speed Networks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the blueprint for a distributed query engine that addresses these problems by considering both levels of networks holistically. |
Wolf Roediger; Tobias Muehlbauer; Alfons Kemper; Thomas Neumann; |
2015 | 23 | NXgraph: An Efficient Graph Processing System On A Single Machine IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present NXgraph, an efficient graph processing system on a single machine. |
YUZE CHI et. al. |
2015 | 24 | A Selectivity Based Approach To Continuous Pattern Detection In Streaming Graphs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a Lazy Search algorithm where the search strategy is decided on a vertex-to-vertex basis depending on the likelihood of a match in the vertex neighborhood. |
Sutanay Choudhury; Lawrence Holder; George Chin; Khushbu Agarwal; John Feo; |
2015 | 25 | Taming Subgraph Isomorphism For RDF Query Processing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, based on the state-of-the-art subgraph isomorphism algorithm, we propose an in-memory solution, TurboHOM++, which is tamed for the RDF processing, and we compare it with the representative RDF processing engines for several RDF benchmarks in a server machine where billions of triples can be loaded in memory. |
Jinha Kim; Hyungyu Shin; Wook-Shin Han; Sungpack Hong; Hassan Chafi; |
2015 | 26 | Principles Of Dataset Versioning: Exploring The Recreation/Storage Tradeoff IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study this trade-off in a principled manner: we formulate six problems under various settings, trading off these quantities in various ways, demonstrate that most of the problems are intractable, and propose a suite of inexpensive heuristics drawing from techniques in delay-constrained scheduling, and spanning tree literature, to solve these problems. |
Souvik Bhattacherjee; Amit Chavan; Silu Huang; Amol Deshpande; Aditya Parameswaran; |
2015 | 27 | Exposing The Probabilistic Causal Structure Of Discrimination IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we take a principled causal approach to the data mining problem of discrimination detection in databases. |
Francesco Bonchi; Sara Hajian; Bud Mishra; Daniele Ramazzotti; |
2015 | 28 | Less Is More: Building Selective Anomaly Ensembles IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tap into this gap and propose a new ensemble approach for anomaly mining, with application to event detection in temporal graphs. |
Shebuti Rayana; Leman Akoglu; |
2015 | 29 | Join Processing For Graph Patterns: An Old Dog With New Tricks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These new algorithms match or improve on those used in specialized graph-processing systems. |
DUNG NGUYEN et. al. |
2015 | 30 | Multiple Query Optimization On The D-Wave 2X Adiabatic Quantum Computer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the problem of multiple query optimization (MQO). |
Immanuel Trummer; Christoph Koch; |
2014 | 1 | BigDataBench: A Big Data Benchmark Suite From Internet Services IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents our joint research efforts on this issue with several industrial partners. |
LEI WANG et. al. |
2014 | 2 | Protecting Locations With Differential Privacy Under Temporal Correlations IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a systematic solution to preserve location privacy with rigorous privacy guarantee. |
Yonghui Xiao; Li Xiong; |
2014 | 3 | Differential Privacy: An Economic Method For Choosing Epsilon IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine the role that these parameters play in concrete applications, identifying the key questions that must be addressed when choosing specific values. |
JUSTIN HSU et. al. |
2014 | 4 | AsterixDB: A Scalable, Open Source BDMS IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Covered herein are the system’s data model, its query language, and its software architecture. |
SATTAM ALSUBAIEE et. al. |
2014 | 5 | Leveraging Transitive Relations For Crowdsourced Joins IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the crowdsourced join query which aims to utilize humans to find all pairs of matching objects from two collections. |
Jiannan Wang; Guoliang Li; Tim Kraska; Michael J. Franklin; Jianhua Feng; |
2014 | 6 | DataHub: Collaborative Data Science & Dataset Version Management At Scale IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. |
ANANT BHARDWAJ et. al. |
2014 | 7 | An Improved Apriori Algorithm For Association Rules IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on this algorithm, this paper indicates the limitation of the original Apriori algorithm of wasting time for scanning the whole database searching on the frequent itemsets, and presents an improvement on Apriori by reducing that wasted time depending on scanning only some transactions. |
Mohammed Al-Maolegi; Bassam Arkok; |
2014 | 8 | Reliable Diversity-Based Spatial Crowdsourcing By Moving Workers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose three effective approximation approaches, including greedy, sampling, and divide-and-conquer algorithms. |
PENG CHENG et. al. |
2014 | 9 | Rethinking Serializable Multiversion Concurrency Control IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Bohm, a new concurrency control protocol for main-memory multi-versioned database systems. |
Jose M. Faleiro; Daniel J. Abadi; |
2014 | 10 | PRESS: A Novel Framework Of Trajectory Compression In Road Networks IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the trajectory data, and propose a new framework, namely PRESS (Paralleled Road-Network-Based Trajectory Compression), to effectively compress trajectory data under road network constraints. |
Renchu Song; Weiwei Sun; Baihua Zheng; Yu Zheng; |
2014 | 11 | Better Bitmap Performance With Roaring Bitmaps IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building on prior work, we introduce the Roaring compressed bitmap format: it uses packed arrays for compression instead of RLE. |
Samy Chambi; Daniel Lemire; Owen Kaser; Robert Godin; |
2014 | 12 | DimmWitted: A Study Of Main-Memory Statistical Analytics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our goal is to understand tradeoffs in accessing the data in row- or column-order and at what granularity one should share the model and data for a statistical task. |
Ce Zhang; Christopher Ré; |
2014 | 13 | NoSQL Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this document, I present the main notions of NoSQL databases and compare four selected products (Riak, MongoDB, Cassandra, Neo4J) according to their capabilities with respect to consistency, availability, and partition tolerance, as well as performance. |
Massimo Carro; |
2014 | 14 | A Comparison Of Blocking Methods For Record Linkage IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We compare these approaches in terms of their recall, reduction ratio, and computational complexity. |
Rebecca C. Steorts; Samuel L. Ventura; Mauricio Sadinle; Stephen E. Fienberg; |
2014 | 15 | Scalable Density-Based Distributed Clustering IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a scalable density-based distributed clustering algorithm which allows a user-defined trade-off between clustering quality and the number of transmitted objects from the different local sites to a global server site. |
Eshref Januzaj; Hans-Peter Kriegel; Martin Pfeifle; |
2014 | 16 | Skew In Parallel Query Processing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of computing a conjunctive query q in parallel, using p of servers, on a large database. |
Paul Beame; Paraschos Koutris; Dan Suciu; |
2014 | 17 | Acyclicity Notions For Existential Rules And Their Application To Query Answering In Ontologies IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present two new acyclicity notions called model-faithful acyclicity (MFA) and model-summarising acyclicity (MSA). |
BERNARDO CUENCA GRAU et. al. |
2014 | 18 | Pregelix: Big(ger) Graph Analytics On A Dataflow Engine IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15x speedup compared to Apache Giraph and up to 35x speedup compared to distributed GraphLab), and makes more effective use of available machine resources to support Big(ger) Graph Analytics. |
Yingyi Bu; Vinayak Borkar; Jianfeng Jia; Michael J. Carey; Tyson Condie; |
2014 | 19 | The Missing Piece In Complex Analytics: Low Latency, Scalable Model Management And Serving With Velox IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Velox, a new component of the Berkeley Data Analytics Stack. |
DANIEL CRANKSHAW et. al. |
2014 | 20 | Rapid Sampling For Visualizations With Ordering Guarantees IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the problem of rapidly generating approximate visualizations while preserving crucial visual proper- ties of interest to analysts. |
ALBERT KIM et. al. |
2014 | 21 | Evaluating The Crowd With Confidence IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we devise techniques to generate confidence intervals for worker error rate estimates, thereby enabling a better evaluation of worker quality. |
Manas Joglekar; Hector Garcia-Molina; Aditya Parameswaran; |
2014 | 22 | BDGS: A Scalable Big Data Generator Suite In Big Data Benchmarking IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This gives rise to various new challenges about how we design generators efficiently and successfully. |
ZIJIAN MING et. al. |
2014 | 23 | Improvised Apriori Algorithm Using Frequent Pattern Tree For Real Time Applications In Data Mining IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on this algorithm, this paper indicates the limitation of the original Apriori algorithm of wasting time and space for scanning the whole database searching on the frequent itemsets, and present an improvement on Apriori. |
Akshita Bhandari; Ashutosh Gupta; Debasis Das; |
2014 | 24 | Processing SPARQL Queries Over Distributed RDF Graphs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose techniques for processing SPARQL queries over a large RDF graph in a distributed environment. |
Peng Peng; Lei Zou; M. Tamer Özsu; Lei Chen; Dongyan Zhao; |
2014 | 25 | Query Rewriting And Optimization For Ontological Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss two important aspects of this problem: query rewriting and query optimization. |
Georg Gottlob; Giorgio Orsi; Andreas Pieris; |
2014 | 26 | A Data- And Workload-Aware Algorithm For Range Queries Under Differential Privacy IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a new algorithm for answering a given set of range queries under $\epsilon$-differential privacy which often achieves substantially lower error than competing methods. |
Chao Li; Michael Hay; Gerome Miklau; Yue Wang; |
2014 | 27 | GraphX: Unifying Data-Parallel And Graph-Parallel Analytics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. |
REYNOLD S. XIN et. al. |
2014 | 28 | Metadata For Energy Disaggregation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a metadata schema for representing appliances, meters, buildings, datasets, prior knowledge about appliances and appliance models. |
Jack Kelly; William Knottenbelt; |
2014 | 29 | Aber-OWL: A Framework For Ontology-based Data Access In Biology IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have developed the Aber-OWL infrastructure that provides reasoning services for bio-ontologies. |
Robert Hoehndorf; Luke Slater; Paul N. Schofield; Georgios V. Gkoutos; |
2014 | 30 | The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a new approach to achieving strong consistency in distributed systems while minimizing communication between nodes. |
SUDIP ROY et. al. |
2013 | 1 | NoSQL Database: New Era Of Databases For Big Data Analytics – Classification, Characteristics And Comparison IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This report is intended to help users, especially to the organizations to obtain an independent understanding of the strengths and weaknesses of various NoSQL database approaches to supporting applications that process huge volumes of data. |
A B M Moniruzzaman; Syed Akhter Hossain; |
2013 | 2 | Undefined By Data: A Survey Of Big Data Definitions IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This short paper attempts to collate the various definitions which have gained some degree of traction and to furnish a clear and concise definition of an otherwise ambiguous term. |
Jonathan Stuart Ward; Adam Barker; |
2013 | 3 | Communication Steps For Parallel Query Processing IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. |
Paul Beame; Paraschos Koutris; Dan Suciu; |
2013 | 4 | Ontology-based Data Access: A Study Through Disjunctive Datalog, CSP, And MMSNP IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study several classes of ontology-mediated queries, where the database queries are given as some form of conjunctive query and the ontologies are formulated in description logics or other relevant fragments of first-order logic, such as the guarded fragment and the unary-negation fragment. |
Meghyn Bienvenu; Balder ten Cate; Carsten Lutz; Frank Wolter; |
2013 | 5 | Skew Strikes Back: New Developments In The Theory Of Join Algorithms IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In spite of this study of join queries, the textbook description of join processing is suboptimal. |
Hung Q. Ngo; Christopher Re; Atri Rudra; |
2013 | 6 | Blowfish Privacy: Tuning Privacy-Utility Trade-offs Using Policies IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Blowfish, a class of privacy definitions inspired by the Pufferfish framework, that provides a rich interface for this trade-off. |
Xi He; Ashwin Machanavajjhala; Bolin Ding; |
2013 | 7 | Querying Knowledge Graphs By Example Entity Tuples IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As an initial step toward improving the usability of knowledge graphs, we propose to query such data by example entity tuples, without requiring users to form complex graph queries. |
Nandish Jayaram; Arijit Khan; Chengkai Li; Xifeng Yan; Ramez Elmasri; |
2013 | 8 | Mining Frequent Graph Patterns With Differential Privacy IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose the first differentially private algorithm for mining frequent graph patterns. |
Entong Shen; Ting Yu; |
2013 | 9 | CrowdPlanner: A Crowd-Based Route Recommendation System IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our system addresses two critical issues in its core components: a) task generation component generates a series of informative and concise questions with optimized ordering for a given candidate route set so that workers feel comfortable and easy to answer; and b) worker selection component utilizes a set of selection criteria and an efficient algorithm to find the most eligible workers to answer the questions with high accuracy. |
Han Su; |
2013 | 10 | Aggregation And Ordering In Factorised Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend FDB to support a larger class of practical queries with aggregates and ordering. |
Nurzhan Bakibayev; Tomáš Kočiský; Dan Olteanu; Jakub Závodný; |
2013 | 11 | Algorithm And Approaches To Handle Large Data- A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a review of various algorithms from 1994-2013 necessary for handling such large data set. |
Chanchal Yadav; Shuliang Wang; Manoj Kumar; |
2013 | 12 | The Operad Of Wiring Diagrams: Formalizing A Graphical Language For Databases, Recursion, And Plug-and-play Circuits IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that wiring diagrams form the morphisms of an operad $\mcT$, capturing this self-similarity. |
David I. Spivak; |
2013 | 13 | Learning And Verifying Quantified Boolean Queries By Example IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the number of questions needed to learn or verify qhorn queries, a special class of Boolean quantified queries whose underlying form is conjunctions of quantified Horn expressions. |
Azza Abouzied; Dana Angluin; Christos Papadimitriou; Joseph M. Hellerstein; Avi Silberschatz; |
2013 | 14 | Simple, Fast, And Scalable Reachability Oracle IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present two simple and efficient labeling algorithms, Hierarchical-Labeling and Distribution-Labeling, which can work onmassive real-world graphs: their construction time is an order of magnitude faster than the setcover based labeling approach, and transitive closure materialization is not needed. |
Ruoming Jin; Guan Wang; |
2013 | 15 | Managing Schema Evolution In NoSQL Data Stores IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discuss the recommendations of the developer community on handling schema changes, and introduce a simple, declarative schema evolution language. |
Stefanie Scherzinger; Meike Klettke; Uta Störl; |
2013 | 16 | Probabilistic Nearest Neighbor Queries On Uncertain Moving Object Trajectories IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we fill this gap by addressing probabilistic nearest neighbor queries in databases with uncertain trajectories modeled by stochastic processes, specifically the Markov chain model. |
JOHANNES NIEDERMAYER et. al. |
2013 | 17 | Oblivious Query Processing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present oblivious query processing algorithms for a rich class of database queries involving selections, joins, grouping and aggregation. |
Arvind Arasu; Raghav Kaushik; |
2013 | 18 | Approximate K-nearest Neighbour Based Spatial Clustering Using K-d Tree IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an implementation of Approximate kNN-based spatial clustering algorithm using the K-d tree is proposed. |
Dr. Mohammed Otair; |
2013 | 19 | Census Data Mining And Data Analysis Using WEKA IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we have made an attempt to demonstrate how one can extract the local (district) level census, socio-economic and population related other data for knowledge discovery and their analysis using the powerful data mining tool Weka. |
Sudhir B Jagtap; Kodge B. G; |
2013 | 20 | Beyond Worst-Case Analysis For Joins With Minesweeper IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a new algorithm, Minesweeper, that is able to satisfy stronger runtime guarantees than previous join algorithms (colloquially, `beyond worst-case guarantees’) for data in indexed search trees. |
Hung Q. Ngo; Dung T. Nguyen; Christopher Ré; Atri Rudra; |
2013 | 21 | Parallel Triangle Counting In Massive Streaming Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Driven by these applications and the trend that modern graph datasets are both large and dynamic, we present the design and implementation of a fast and cache-efficient parallel algorithm for estimating the number of triangles in a massive undirected graph whose edges arrive as a stream. |
Kanat Tangwongsan; A. Pavan; Srikanta Tirthapura; |
2013 | 22 | Data Placement And Replica Selection For Improving Co-location In Distributed Environments IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we exploit the fact that most distributed environments need to use replication for fault tolerance, and we devise workload-driven replica selection and placement algorithms that attempt to minimize the average query span. |
K. Ashwin Kumar; Amol Deshpande; Samir Khuller; |
2013 | 23 | Privacy Preserving Social Network Publication Against Mutual Friend Attacks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel privacy attack model and refer it as a mutual friend attack. |
Chongjing Sun; Philip S. Yu; Xiangnan Kong; Yan Fu; |
2013 | 24 | Transparent Data Encryption — Solution For Security Of Database Contents IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: The present study deals with Transparent Data Encryption which is a technology used to solve the problems of security of data. Transparent Data Encryption means encrypting … |
Dr. Anwar Pasha Deshmukh; Dr. Riyazuddin Qureshi; |
2013 | 25 | A Survey On Array Storage, Query Languages, And Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we provide a guide for past, present, and future research in array processing. |
Florin Rusu; Yu Cheng; |
2013 | 26 | Want A Good Answer? Ask A Good Question First! IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of inferring the quality of questions and answers through a case study of a software CQA (Stack Overflow). |
YUAN YAO et. al. |
2013 | 27 | On Graph Deltas For Historical Queries IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of evaluating historical queries on graphs. |
Georgia Koloniari; Dimitris Souravlias; Evaggelia Pitoura; |
2013 | 28 | Context-based Diversification For Keyword Queries Over XML Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this challenging problem, in this paper we propose an approach that automatically diversifies XML keyword search based on its different contexts in the XML data. |
Jianxin Li; Chengfei Liu; Liang Yao; Jeffrey Xu Yu; |
2013 | 29 | Efficient Single-Source Shortest Path And Distance Queries On Large Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the deficiency of existing work, this paper presents {\em Highways-on-Disk (HoD)}, a disk-based index that supports both SSD and SSSP queries on directed and weighted graphs. |
Andy Diwen Zhu; Xiaokui Xiao; Sibo Wang; Wenqing Lin; |
2013 | 30 | First-Order Provenance Games IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new model of provenance, based on a game-theoretic approach to query evaluation. |
Sven Köhler; Bertram Ludäscher; Daniel Zinn; |
2012 | 1 | Distributed GraphLab: A Framework For Machine Learning In The Cloud IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees. |
YUCHENG LOW et. al. |
2012 | 2 | BlinkDB: Queries With Bounded Errors And Bounded Response Times On Very Large Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present BlinkDB, a massively parallel, sampling-based approximate query engine for running ad-hoc, interactive SQL queries on large volumes of data. |
Sameer Agarwal; Aurojit Panda; Barzan Mozafari; Samuel Madden; Ion Stoica; |
2012 | 3 | Scalable K-Means++ IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we show how to drastically reduce the number of passes needed to obtain, in parallel, a good initialization. |
Bahman Bahmani; Benjamin Moseley; Andrea Vattani; Ravi Kumar; Sergei Vassilvitskii; |
2012 | 4 | CrowdER: Crowdsourcing Entity Resolution IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are used to verify only the most likely matching pairs. |
Jiannan Wang; Tim Kraska; Michael J. Franklin; Jianhua Feng; |
2012 | 5 | Interactive Analytical Processing In Big Data Systems: A Cross-Industry Study Of MapReduce Workloads IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key contribution is a characterization of new MapReduce workloads which are driven in part by interactive analysis, and which make heavy use of query-like programming frameworks on top of MapReduce. |
Yanpei Chen; Sara Alspaugh; Randy Katz; |
2012 | 6 | Shark: SQL And Rich Analytics At Scale IF:6 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages a novel distributed memory abstraction to provide a … |
REYNOLD XIN et. al. |
2012 | 7 | The MADlib Analytics Library Or MAD Skills, The SQL IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce the MADlib project, including the background that led to its beginnings, and the motivation for its open source nature. |
JOE HELLERSTEIN et. al. |
2012 | 8 | Functional Mechanism: Regression Analysis Under Differential Privacy IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by this, we propose the Functional Mechanism, a differentially private method designed for a large class of optimization-based analyses. |
Jun Zhang; Zhenjie Zhang; Xiaokui Xiao; Yin Yang; Marianne Winslett; |
2012 | 9 | Efficient Subgraph Matching On Billion Node Graphs IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of subgraph matching on billion-node graphs. |
Zhao Sun; Hongzhi Wang; Haixun Wang; Bin Shao; Jianzhong Li; |
2012 | 10 | Truss Decomposition In Massive Networks IF:6 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: The k-truss is a type of cohesive subgraphs proposed recently for the study of networks. While the problem of computing most cohesive subgraphs is NP-hard, there exists a … |
Jia Wang; James Cheng; |
2012 | 11 | A Bayesian Approach To Discovering Truth From Conflicting Sources For Data Integration IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a probabilistic graphical model that can automatically infer true records and source quality without any supervision. |
Bo Zhao; Benjamin I. P. Rubinstein; Jim Gemmell; Jiawei Han; |
2012 | 12 | The Vertica Analytic Database: C-Store 7 Years Later IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes the system architecture of the Vertica Analytic Database (Vertica), a commercialization of the design of the C-Store research prototype. |
ANDREW LAMB et. al. |
2012 | 13 | CDAS: A Crowdsourcing Data Analytics System IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the principles of our quality-sensitive model. |
XUAN LIU et. al. |
2012 | 14 | Efficient Processing Of K Nearest Neighbor Joins Using MapReduce IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. |
Wei Lu; Yanyan Shen; Su Chen; Beng Chin Ooi; |
2012 | 15 | Challenging The Long Tail Recommendation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel suite of graph-based algorithms for the long tail recommendation. |
Hongzhi Yin; Bin Cui; Jing Li; Junjie Yao; Chen Chen; |
2012 | 16 | MDCC: Multi-Data Center Consistency IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With MDCC (Multi-Data Center Consistency), we describe the first optimistic commit protocol, that does not require a master or partitioning, and is strongly consistent at a cost similar to eventually consistent protocols. |
Tim Kraska; Gene Pang; Michael J. Franklin; Samuel Madden; |
2012 | 17 | Solving Big Data Challenges For Enterprise Application Performance Management IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present our experience and a comprehensive performance evaluation of six modern (open-source) data stores in the context of application performance monitoring as part of CA Technologies initiative. |
TILMANN RABL et. al. |
2012 | 18 | The Survey Of Data Mining Applications And Feature Scope IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we have focused a variety of techniques, approaches and different areas of the research which are helpful and marked as the important field of data mining Technologies. |
Neelamadhab Padhy; Dr. Pragnyaban Mishra; Rasmita Panigrahi; |
2012 | 19 | Massively Parallel Sort-Merge Joins In Main Memory Multi-Core Database Systems IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. |
Martina-Cezara Albutiu; Alfons Kemper; Thomas Neumann; |
2012 | 20 | Densest Subgraph In Streaming And MapReduce IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present new algorithms for finding the densest subgraph in the streaming model. |
Bahman Bahmani; Ravi Kumar; Sergei Vassilvitskii; |
2012 | 21 | Using Data Mining Techniques For Diagnosis And Prognosis Of Cancer Disease IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we have discussed various data mining approaches that have been utilized for breast cancer diagnosis and prognosis. |
Shweta Kharya; |
2012 | 22 | Probabilistically Bounded Staleness For Practical Partial Quorums IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine this trade-off in the context of quorum-replicated data stores. |
Peter Bailis; Shivaram Venkataraman; Michael J. Franklin; Joseph M. Hellerstein; Ion Stoica; |
2012 | 23 | DBToaster: Higher-order Delta Processing For Dynamic, Frequently Fresh Views IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present viewlet transforms, a recursive finite differencing technique applied to queries. |
Yanif Ahmad; Oliver Kennedy; Christoph Koch; Milos Nikolic; |
2012 | 24 | Don’t Thrash: How To Cache Your Hash On Flash IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents new alternatives to the well-known Bloom filter data structure. |
MICHAEL A. BENDER et. al. |
2012 | 25 | Towards A Unified Architecture For In-RDBMS Analytics IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main contribution in this work is to take a step towards such a unified architecture. |
Xixuan Feng; Arun Kumar; Ben Recht; Christopher Ré; |
2012 | 26 | Dense Subgraph Maintenance Under Streaming Edge Weight Updates For Real-time Story Identification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on these, we propose a novel algorithm, DYNDENS, which outperforms adaptations of existing techniques to this setting, and yields meaningful results. |
Albert Angel; Nick Koudas; Nikos Sarkas; Divesh Srivastava; |
2012 | 27 | PrivBasis: Frequent Itemset Mining With Differential Privacy IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of how to perform frequent itemset mining on transaction databases while satisfying differential privacy. |
Ninghui Li; Wahbeh Qardaji; Dong Su; Jianneng Cao; |
2012 | 28 | Verification Of Relational Data-Centric Dynamic Systems With External Services IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we study verification of (first-order) mu-calculus variants over relational data-centric dynamic systems, where data are represented by a full-fledged relational database, and the process is described in terms of atomic actions that evolve the database. |
Babak Bagheri Hariri; Diego Calvanese; Giuseppe De Giacomo; Alin Deutsch; Marco Montali; |
2012 | 29 | Mining Frequent Itemsets Over Uncertain Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through extensive experiments, we verify that the two definitions have a tight connection and can be unified together when the size of data is large enough. |
Yongxin Tong; Lei Chen; Yurong Cheng; Philip S. Yu; |
2012 | 30 | V-SMART-Join: A Scalable MapReduce Framework For All-Pair Similarity Joins Of Multisets And Vectors IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes V-SMART-Join, a scalable MapReduce-based framework for discovering all pairs of similar entities. |
Ahmed Metwally; Christos Faloutsos; |
2011 | 1 | A Data-Based Approach To Social Influence Maximization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study influence maximization from a novel data-based perspective. |
Amit Goyal; Francesco Bonchi; Laks V. S. Lakshmanan; |
2011 | 2 | PARIS: Probabilistic Alignment Of Relations, Instances, And Schema IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present PARIS, an approach for the automatic alignment of ontologies. |
Fabian M. Suchanek; Serge Abiteboul; Pierre Senellart; |
2011 | 3 | Differentially Private Spatial Decompositions IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on spatial data such as locations and more generally any data that can be indexed by a tree structure. |
Graham Cormode; Magda Procopiuc; Entong Shen; Divesh Srivastava; Ting Yu; |
2011 | 4 | High-Performance Concurrency Control Mechanisms For Main-Memory Databases IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce two efficient concurrency control methods specifically designed for main-memory databases. |
PER-ÅKE LARSON et. al. |
2011 | 5 | Human-powered Sorts And Joins IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. |
Adam Marcus; Eugene Wu; David Karger; Samuel Madden; Robert Miller; |
2011 | 6 | Tuffy: Scaling Up Statistical Inference In Markov Logic Networks Using An RDBMS IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Tuffy that achieves scalability via three novel contributions: (1) a bottom-up approach to grounding that allows us to leverage the full power of the relational optimizer, (2) a novel hybrid architecture that allows us to perform AI-style local search efficiently using an RDBMS, and (3) a theoretical insight that shows when one can (exponentially) improve the efficiency of stochastic local search. |
Feng Niu; Christopher Ré; AnHai Doan; Jude Shavlik; |
2011 | 7 | RTED: A Robust Algorithm For The Tree Edit Distance IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present RTED, a robust tree edit distance algorithm. |
Mateusz Pawlik; Nikolaus Augsten; |
2011 | 8 | Guided Data Repair IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. |
Mohamed Yakout; Ahmed K. Elmagarmid; Jennifer Neville; Mourad Ouzzani; Ihab F. Ilyas; |
2011 | 9 | Personalized Social Recommendations – Accurate Or Private? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main contribution of this work is in formalizing these expected trade-offs between the accuracy and privacy of personalized social recommendations. |
Ashwin Machanavajjhala; Aleksandra Korolova; Atish Das Sarma; |
2011 | 10 | Automatic Optimization For MapReduce Programs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper covers Manimal, which automatically analyzes MapReduce programs and applies appropriate data- aware optimizations, thereby requiring no additional help at all from the programmer. |
Eaman Jahani; Michael J. Cafarella; Christopher Ré; |
2011 | 11 | PASS-JOIN: A Partition-based Method For Similarity Joins IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study string similarity joins with edit-distance constraints, which find similar string pairs from two large sets of strings whose edit distance is within a given threshold. |
Guoliang Li; Dong Deng; Jiannan Wang; Jianhua Feng; |
2011 | 12 | Using Paxos To Build A Scalable, Consistent, And Highly Available Datastore IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes Spinnaker’s Paxos-based replication protocol. |
Jun Rao; Eugene J. Shekita; Sandeep Tata; |
2011 | 13 | Provenance For Aggregate Queries IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. |
Yael Amsterdamer; Daniel Deutch; Val Tannen; |
2011 | 14 | Capturing Topology In Graph Pattern Matching IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: (3) We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs. |
Shuai Ma; Yang Cao; Wenfei Fan; Jinpeng Huai; Tianyu Wo; |
2011 | 15 | Fast Updates On Read-Optimized Databases Using Multi-Core CPUs IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the second half, we present an optimized merge process reducing the merge overhead of current systems by a factor of 30. |
JENS KRUEGER et. al. |
2011 | 16 | Bayesian Locality Sensitive Hashing For Fast Similarity Search IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search – performing candidate pruning and similarity estimation using LSH. |
Venu Satuluri; Srinivasan Parthasarathy; |
2011 | 17 | Putting Lipstick On Pig: Enabling Database-style Workflow Provenance IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. |
YAEL AMSTERDAMER et. al. |
2011 | 18 | Data Mining : A Prediction Of Performer Or Underperformer Using Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, data mining techniques name Byes classification method is used on these data to help an institution. |
Umesh Kumar Pandey; Saurabh Pal; |
2011 | 19 | Human-Assisted Graph Search: It’s Okay To Ask Questions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of human-assisted graph search: given a directed acyclic graph with some (unknown) target node(s), we consider the problem of finding the target node(s) by asking an omniscient human questions of the form Is there a target node that is reachable from the current node? |
Aditya Parameswaran; Anish Das Sarma; Hector Garcia-Molina; Neoklis Polyzotis; Jennifer Widom; |
2011 | 20 | A General Framework For Representing, Reasoning And Querying With Annotated Semantic Web Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a generic framework for representing and reasoning with annotated Semantic Web data, a task becoming more important with the recent increased amount of inconsistent and non-reliable meta-data on the web. |
Antoine Zimmermann; Nuno Lopes; Axel Polleres; Umberto Straccia; |
2011 | 21 | Automatic Wrappers For Large Scale Web Extraction IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a generic framework to make wrapper induction algorithms tolerant to noise in the training data. |
Nilesh Dalvi; Ravi Kumar; Mohamed Soliman; |
2011 | 22 | Column-Oriented Storage Techniques For MapReduce IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes how column-oriented storage techniques can be incorporated in Hadoop in a way that preserves its popular programming APIs. |
Avrilia Floratou; Jignesh Patel; Eugene Shekita; Sandeep Tata; |
2011 | 23 | Secure Mining Of Association Rules In Horizontally Distributed Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a protocol for secure mining of association rules in horizontally distributed databases. |
Tamir Tassa; |
2011 | 24 | Large-Scale Collective Entity Matching IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards this end, we propose a principled framework to scale any generic EM algorithm. |
Vibhor Rastogi; Nilesh Dalvi; Minos Garofalakis; |
2011 | 25 | Analysis Of Web Logs And Web User In Web Mining IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper gives a detailed discussion about these log files, their formats, their creation, access procedures, their uses, various algorithms used and the additional parameters that can be used in the log files which in turn gives way to an effective mining. |
L. K. Joshila Grace; V. Maheswari; Dhinaharan Nagamalai; |
2011 | 26 | REX: Explaining Relationships Between Entity Pairs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel problem called entity relationship explanation, which seeks to explain why a pair of entities are connected, and solve this challenging problem by integrating the above two complementary approaches, i.e., we leverage the knowledge base to explain the connections discovered between entity pairs. |
Lujun Fang; Anish Das Sarma; Cong Yu; Philip Bohannon; |
2011 | 27 | Query-time Entity Resolution IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We validate our approach on two large real-world publication databases where we show the usefulness of collective resolution and at the same time demonstrate the need for adaptive strategies for query processing. |
I. Bhattacharya; L. Getoor; |
2011 | 28 | Fast Set Intersection In Memory IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worst-case efficient way. |
Bolin Ding; Arnd Christian König; |
2011 | 29 | View Selection In Semantic Web Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To account for implicit triples in query answers, we propose a novel RDF query reformulation algorithm and an innovative way of incorporating it into view selection in order to avoid a combinatorial explosion in the complexity of the selection process. |
François Goasdoué; Konstantinos Karanasos; Julien Leblay; Ioana Manolescu; |
2011 | 30 | Customer Data Clustering Using Data Mining Technique IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objectives of this paper are to identify the high-profit, high-value and low-risk customers by one of the data mining technique – customer clustering. |
Dr. Sankar Rajagopal; |
2010 | 1 | Discovery Of Convoys In Trajectory Databases IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by this, we develop three efficient algorithms for convoy discovery that adopt the well-known filter-refinement framework. |
Hoyoung Jeung; Man Lung Yiu; Xiaofang Zhou; Christian S. Jensen; Heng Tao Shen; |
2010 | 2 | The Complexity Of Causality And Responsibility For Query Answers And Non-Answers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we adapt Halpern, Pearl, and Chockler’s recent definitions of causality and responsibility to define the causes of answers and non-answers to queries, and their degree of responsibility. |
Alexandra Meliou; Wolfgang Gatterbauer; Katherine F. Moore; Dan Suciu; |
2010 | 3 | ElasTraS: An Elastic Transactional Data Store In The Cloud IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ElasTraS which addresses this issue of scalability and elasticity of the data store in a cloud computing environment to leverage from the elastic nature of the underlying infrastructure, while providing scalable transactional data access. |
Sudipto Das; Divyakant Agrawal; Amr El Abbadi; |
2010 | 4 | Privacy In Geo-social Networks: Proximity Notification With Untrusted Service Providers And Curious Buddies IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper presents two new protocols providing complete privacy with respect to the SP, and controllable privacy with respect to the buddies. |
Sergio Mascetti; Dario Freni; Claudio Bettini; X. Sean Wang; Sushil Jajodia; |
2010 | 5 | Data Cleaning And Query Answering With Matching Dependencies And Matching Functions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Assuming the existence of matching functions for making two attributes values equal, we formally introduce the process of cleaning an instance using matching dependencies, as a chase-like procedure. |
Leopoldo Bertossi; Solmaz Kolahi; Laks V. S. Lakshmanan; |
2010 | 6 | Learning Deterministic Regular Expressions For The Inference Of Schemas From XML Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by this observation, we provide a probabilistic algorithm that learns k-OREs for increasing values of k, and selects the deterministic one that best describes the sample based on a Minimum Description Length argument. |
Geert Jan Bex; Wouter Gelade; Frank Neven; Stijn Vansummeren; |
2010 | 7 | Scalable Probabilistic Databases With Factor Graphs And MCMC IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over possible worlds; Markov chain Monte Carlo (MCMC) inference is then used to recover this uncertainty to a desired level of fidelity. |
Michael Wick; Andrew McCallum; Gerome Miklau; |
2010 | 8 | Functorial Data Migration IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present a simple database definition language: that of categories and functors. |
David I. Spivak; |
2010 | 9 | Relational Transducers For Declarative Networking IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by a recent conjecture concerning the expressiveness of declarative networking, we propose a formal computation model for eventually consistent distributed querying, based on relational transducers. |
Tom Ameloot; Frank Neven; Jan Van den Bussche; |
2010 | 10 | Data Stream Clustering: Challenges And Issues IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this survey, we try to clarify: first, the different problem definitions related to data stream clustering in general; second, the specific difficulties encountered in this field of research; third, the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems. |
Madjid Khalilian; Norwati Mustapha; |
2010 | 11 | Provenance Views For Module Privacy IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The problem we address in this paper is the following: Given a workflow, abstractly modeled by a relation R, a privacy requirement \Gamma and costs associated with data. |
Susan B. Davidson; Sanjeev Khanna; Tova Milo; Debmalya Panigrahi; Sudeepa Roy; |
2010 | 12 | Behavioral Simulations In MapReduce IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present BRACE (Big Red Agent-based Computation Engine), which extends the MapReduce framework to process these simulations efficiently across a cluster. |
GUOZHANG WANG et. al. |
2010 | 13 | Mining Frequent Itemsets Using Genetic Algorithm IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main aim of this paper is to find all the frequent itemsets from given data sets using genetic algorithm. |
Soumadip Ghosh; Sushanta Biswas; Debasree Sarkar; Partha Pratim Sarkar; |
2010 | 14 | Semi-Automatic Index Tuning: Keeping DBAs In The Loop IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new index recommendation technique, termed semi-automatic tuning, that keeps the DBA in the loop by generating recommendations that use feedback about the DBA’s preferences. |
Karl Schnaitter; Neoklis Polyzotis; |
2010 | 15 | Transparent Anonymization: Thwarting Adversaries Who Know The Algorithm IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Numerous generalization techniques have been proposed for privacy preserving data publishing. |
Xiaokui Xiao; Yufei Tao; Nick Koudas; |
2010 | 16 | Page-Differential Logging: An Efficient And DBMS-independent Approach For Storing Data Into Flash Memory IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new method of storing data, called page-differential logging, for flash-based storage systems that solves the drawbacks of the two methods. |
Yi-Reun Kim; Kyu-Young Whang; Il-Yeol Song; |
2010 | 17 | An Efficient Rigorous Approach For Identifying Statistically Significant Frequent Itemsets IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address significance in the context of frequent itemset mining. |
ADAM KIRSCH et. al. |
2010 | 18 | Preference Elicitation In Prioritized Skyline Queries IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study p-skyline queries that generalize skyline queries by allowing varying attribute importance in preference relations. |
Denis Mindolin; Jan Chomicki; |
2010 | 19 | Finding Sequential Patterns From Large Sequence Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we theoretically provided a brief overview three types of sequential patterns model. |
Mahdi Esmaeili; Fazekas Gabor; |
2010 | 20 | A Logical Temporal Relational Data Model IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a conceptual model for handling time varying attributes in the relational database model with minimal temporal attributes. |
Nadeem Mahmood; Aqil Burney; Kamran Ahsan; |
2010 | 21 | Automating Fine Concurrency Control In Object-Oriented Databases IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Several propositions were done to provide adapted concurrency control to object-oriented databases. However, most of these proposals miss the fact that considering solely read and … |
Carmelo Malta; José Martinez; |
2010 | 22 | Active Integrity Constraints And Revision Programming IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main goal is to establish a comprehensive framework of semantics for active integrity constraints, to find a parallel framework for revision programs, and to relate the two. |
L. Caroprese; M. Truszczynski; |
2010 | 23 | Faster Query Answering In Probabilistic Databases Using Read-Once Functions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tell a better story for a large subclass of boolean event expressions: those that are generated by conjunctive queries without self-joins and on tuple-independent probabilistic databases. |
Sudeepa Roy; Vittorio Perduca; Val Tannen; |
2010 | 24 | Discovering Potential User Browsing Behaviors Using Custom-built Apriori Algorithm IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We have proposed a custom-built apriori algorithm to find the effective pattern analysis. |
Sandeep Singh Rawat; Lakshmi Rajamani; |
2010 | 25 | Mining Target-Oriented Sequential Patterns With Time-Intervals IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present an algorithm to discover target-oriented sequential pattern with time-intervals. |
Hao-En Chueh; |
2010 | 26 | Clustering High Dimensional Data Using Subspace And Projected Clustering Algorithms IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conclusions/Recommendations: In this study, we analyze in detail the properties of different data clustering method. |
Rahmat Widia Sembiring; Jasni Mohamad Zain; Abdullah Embong; |
2010 | 27 | Data Conflict Resolution Using Trust Mappings IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the first principled solution to the automatic conflict resolution problem in a community database. |
Wolfgang Gatterbauer; Dan Suciu; |
2010 | 28 | Towards An Incremental Maintenance Of Cyclic Association Rules IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an incremental algorithm for cyclic association rules maintenance. |
Eya ben Ahmed; Mohamed Salah Gouider; |