Most Influential ArXiv (Databases) Papers (2026-04 Version)
The field of Databases in arXiv covers database management, datamining, and data processing. Roughly it includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1. Paper Digest Team analyzes all papers published in this field in the past years, and presents up to 30 most influential papers for each year. This ranking list is automatically constructed based upon citations from both research papers and granted patents, and will be frequently updated to reflect the most recent changes. To find the latest version of this list or the most influential papers from other conferences/journals, please visit Best Paper Digest page. Note: the most influential papers may or may not include the papers that won the best paper awards. (Version: 2026-04).
As a pioneer in the field since 2018, Paper Digest has curated thousands of such lists, drawing on years of accumulated data across decades of conferences and research topics.To ensure users never miss a breakthrough, our daily digest service sifts through tens of thousands of new papers, clinical trials, news articles, community posts every day – delivering only what matters most to your specific interests. Beyond discovery, Paper Digest offers built-in research tools to help users read articles, write articles, get answers, conduct literature reviews, and generate research reports more efficiently.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Most Influential ArXiv (Databases) Papers (2026-04 Version)
| Year | Rank | Paper | Author(s) |
|---|---|---|---|
| 2025 | 1 | SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Current methodologies primarily utilize supervisedfine-tuning~(SFT) to train the NL2SQL model, which may limit adaptability andinterpretability in new environments~(e.g., finance and healthcare). In orderto enhance the reasoning performance of the NL2SQL model in the above complexsituations, we introduce SQL-R1, a novel NL2SQL reasoning model trained by thereinforcement learning~(RL) algorithms. |
PEIXIAN MA et. al. |
| 2025 | 2 | Foundation Models for Spatio-Temporal Data Science: A Tutorial and Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This survey aims to provide a comprehensive review of STFMs, categorizing existing methodologies and identifying key research directions to advance ST general intelligence. |
YUXUAN LIANG et. al. |
| 2024 | 1 | An Analysis of XML Compression Efficiency IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present an XML test corpus and a combined efficiency metric integrating compression ratio and execution speed. |
Christopher James Augeri; Barry E. Mullins; Leemon C. Baird III; Dursun A. Bulutoglu; Rusty O. Baldwin; |
| 2024 | 2 | A Unified Replay-based Continuous Learning Framework for Spatio-Temporal Prediction on Streaming Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To enable spatio-temporal prediction on streaming data, we propose a unified replay-based continuous learning framework. |
HAO MIAO et. al. |
| 2024 | 3 | When Large Language Models Meet Vector Databases: A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: VecDBs emerge as a compelling solution tothese issues by offering an efficient means to store, retrieve, and manage thehigh-dimensional vector representations intrinsic to LLM operations. Throughthis nuanced review, we delineate the foundational principles of LLMs andVecDBs and critically analyze their integration’s impact on enhancing LLMfunctionalities. |
ZHI JING et. al. |
| 2024 | 4 | BOLD V4: A Centralized Bioinformatics Platform for DNA-based Biodiversity Data IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: BOLD, the Barcode of Life Data System, supports the acquisition, storage, validation, analysis, and publication of DNA barcodes, activities requiring the integration of molecular, … |
SUJEEVAN RATNASINGHAM et. al. |
| 2024 | 5 | Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present Starling, an I/O-efficient disk-resident graph index framework that optimizes data layout and search strategy within the segment. |
MENGZHAO WANG et. al. |
| 2024 | 6 | A Critique of Snapshot Isolation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For example, Google Percolator, implements lock-based snapshot isolation on top of BigTable. We show in this paper that this compromise is not necessary in lock-free implementations of transactional support. |
Daniel Gómez Ferro; Maysam Yabandeh; |
| 2024 | 7 | GriDB: Scaling Blockchain Database Via Sharding and Off-Chain Cross-Shard Mechanism IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To tackle the challenge, this paper presents GriDB, the first scalable blockchain database, by designing a novel off-chain cross-shard mechanism for efficient cross-shard database services. |
ZICONG HONG et. al. |
| 2024 | 8 | KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Knowledge-Orthogonal Reasoning (KOR), a concept aimed at minimizing reliance on domain-specific knowledge, enabling more accurate evaluation of models’ reasoning abilities in out-of-distribution settings. |
KAIJING MA et. al. |
| 2024 | 9 | PURPLE: Making A Large Language Model A Better SQL Writer IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose PURPLE (Pre-trained models Utilized to Retrieve Prompts for Logical Enhancement), which improves accuracy by retrieving demonstrations containing the requisite logical operator composition for the NL2SQL task on hand, thereby guiding LLMs to produce better SQL translation. |
TONGHUI REN et. al. |
| 2024 | 10 | A Survey of Text-to-SQL in The Era of LLMs: Where Are We, and Where Are We Going? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The performance of Text-to-SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of Text-to-SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: Text-to-SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to Text-to-SQL benchmarks; (3) Evaluation: Evaluating Text-to-SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing Text-to-SQL errors to find the root cause and guiding Text-to-SQL models to evolve. |
XINYU LIU et. al. |
| 2024 | 11 | RaBitQ: Quantizing High-Dimensional Vectors with A Theoretical Error Bound for Approximate Nearest Neighbor Search IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite their empirical success, we note that these methods do not have a theoretical error bound and are observed to fail disastrously on some real-world datasets. Motivated by this, we propose a new randomized quantization method named RaBitQ, which quantizes $D$-dimensional vectors into $D$-bit strings. |
Jianyang Gao; Cheng Long; |
| 2024 | 12 | Automated Data Visualization from Natural Language Via Large Language Models: An Exploratory Study IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite the considerable efforts made by these approaches, challenges persist in visualizing data sourced from unseen databases or spanning multiple tables. Taking inspiration from the remarkable generation capabilities of Large Language Models (LLMs), this paper conducts an empirical study to evaluate their potential in generating visualizations, and explore the effectiveness of in-context learning prompts for enhancing this task. |
YANG WU et. al. |
| 2024 | 13 | Text2SQL Is Not Enough: Unifying AI and Databases with TAG IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose Table-Augmented Generation (TAG), a unified and general-purpose paradigm for answering natural language questions over databases. |
ASIM BISWAL et. al. |
| 2024 | 14 | Taurus Database: How to Be Fast, Available, and Frugal in The Cloud IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we describe the design of Taurus, a new multi-tenant cloud database system. |
ALEX DEPOUTOVITCH et. al. |
| 2024 | 15 | Large Language Model Enhanced Text-to-SQL Generation: A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing survey work mainly focuses on rule-based and neural-based approaches, but it still lacks a survey of Text-to-SQL with LLMs. In this paper, we survey the large language model enhanced text-to-SQL generations, classifying them into prompt engineering, fine-tuning, pre-trained, and Agent groups according to training strategies. |
Xiaohu Zhu; Qian Li; Lizhen Cui; Yongkang Liu; |
| 2024 | 16 | LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Secondly, current query rewrite methods usually rely highly on DBMS cost estimators which are often not accurate. In this paper, we address these problems by proposing a novel method of query rewrite named LLM-R2, adopting a large language model (LLM) to propose possible rewrite rules for a database rewrite system. |
Zhaodonghui Li; Haitao Yuan; Huiming Wang; Gao Cong; Lidong Bing; |
| 2024 | 17 | HAIChart: Human and AI Paired Visualization System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we aim to achieve the best of both worlds. |
Yupeng Xie; Yuyu Luo; Guoliang Li; Nan Tang; |
| 2024 | 18 | Schema Matching with Large Language Models: An Experimental Study IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate the use of an off-the-shelf LLM for schema matching. |
Marcel Parciak; Brecht Vandevoort; Frank Neven; Liesbet M. Peeters; Stijn Vansummeren; |
| 2024 | 19 | Mining Sequential Patterns in Uncertain Databases Using Hierarchical Index Structure IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose multiple theoretically tightened pruning upper bounds that remarkably reduce the mining space. |
Kashob Kumar Roy; Md Hasibul Haque Moon; Md Mahmudur Rahman; Chowdhury Farhan Ahmed; Carson K. Leung; |
| 2024 | 20 | DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present DocETL, a system that optimizes complex document processing pipelines, while accounting for LLM shortcomings. |
Shreya Shankar; Tristan Chambers; Tarak Shah; Aditya G. Parameswaran; Eugene Wu; |
| 2024 | 21 | CXL and The Return of Scale-Up Database Engines IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In a nutshell, while the cloud favored scale-out approaches that grew in capacity by adding full servers to a rack, CXL brings back scale-up architectures that can grow by fine-tuning individual resources, all while transforming the rack into a large shared-memory machine. In this paper, we describe why such architectural transformations are now possible, how they benefit emerging heterogeneous hardware platforms for data-intensive systems, and the associated research challenges. |
Alberto Lerner; Gustavo Alonso; |
| 2024 | 22 | The Shapley Value in Database Management IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: More specifically, the article highlights lower and upper bounds on the complexity of calculating the Shapley value, either exactly or approximately, as well as solutions for realizing the calculation in practice. |
Leopoldo Bertossi; Benny Kimelfeld; Ester Livshits; Mikaël Monet; |
| 2024 | 23 | Semantic Operators: A Declarative Model for Rich, AI-based Data Processing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Using this approach, we propose several novel optimizations to accelerate semantic filtering, joining, group-by and top-k operations by up to $1,000\times$. |
LIANA PATEL et. al. |
| 2024 | 24 | The Design of An LLM-powered Unstructured Analytics System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: LLMs demonstrate an uncanny ability to process unstructured data, and as such, have the potential to go beyond search and run complex, semantic analyses at scale. We describe the design of an unstructured analytics system, Aryn, and the tenets and use cases that motivate its design. |
ERIC ANDERSON et. al. |
| 2024 | 25 | ReMatch: Retrieval Enhanced Schema Matching with LLMs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we present a novel method, named ReMatch, for matching schemas using retrieval-enhanced Large Language Models (LLMs). |
Eitam Sheetrit; Menachem Brief; Moshik Mishaeli; Oren Elisha; |
| 2024 | 26 | Geospatial Big Data: Survey and Challenges IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our goal is to give readers a clear view of where GBD mining stands today and where it might go next. |
Jiayang Wu; Wensheng Gan; Han-Chieh Chao; Philip S. Yu; |
| 2024 | 27 | LLMClean: Context-Aware Tabular Data Cleaning Via LLM-Generated OFDs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Nevertheless, crafting these context models is a demanding task, both in terms of resources and expertise, often necessitating specialized knowledge from domain experts. In light of these challenges, this paper introduces an innovative approach, called LLMClean, for the automated generation of context models, utilizing Large Language Models to analyze and understand various datasets. |
Fabian Biester; Mohamed Abdelaal; Daniel Del Gaudio; |
| 2024 | 28 | DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we design an encoding-based tree called Dynamic Encoding Tree (DE-Tree) to improve the indexing efficiency and support efficient range queries based on Euclidean distance. |
Jiuqi Wei; Botao Peng; Xiaodong Lee; Themis Palpanas; |
| 2024 | 29 | Mining Weighted Sequential Patterns in Incremental Uncertain Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these algorithms are confined to mine the precise ones. Therefore, we have developed an algorithm to mine frequent sequences in an uncertain database in this work. |
Kashob Kumar Roy; Md Hasibul Haque Moon; Md Mahmudur Rahman; Chowdhury Farhan Ahmed; Carson Kai-Sang Leung; |
| 2024 | 30 | Development and Evaluation of Artificial Intelligence Techniques for IoT Data Quality Assessment and Curation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Improving data quality is of utmost importance for any domain since data are the basis for any decision-making system and decisions will not be accurate if they are based on inadequate low-quality data. In this paper we are presenting a solution for assessing several quality dimensions of IoT data streams as they are generated. |
Laura Martín; Luis Sánchez; Jorge Lanza; Pablo Sotres; |
| 2023 | 1 | Text-to-SQL Empowered By Large Language Models: A Benchmark Evaluation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address this challenge, in this paper, we first conduct a systematical and extensive comparison over existing prompt engineering methods, including question representation, example selection and example organization, and with these experimental results, we elaborate their pros and cons. Based on these findings, we propose a new integrated solution, named DAIL-SQL, which refreshes the Spider leaderboard with 86.6% execution accuracy and sets a new bar. |
DAWEI GAO et. al. |
| 2023 | 2 | A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although relatively few studies describe existing or introduce new vector database architectures, the core technologies underlying VDBs, such as approximate nearest neighbor search, have been extensively studied and are well documented in the literature. In this work, we present a comprehensive review of the relevant algorithms to provide a general understanding of this booming research area. |
LE MA et. al. |
| 2023 | 3 | Survey of Vector Database Management Systems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For query processing, a variety of similarity scores and query types are now well understood; for storage and indexing, techniques include vector compression, namely quantization, and partitioning based on randomization, learning partitioning, and navigable partitioning; for query optimization and execution, we describe new operators for hybrid queries, as well as techniques for plan enumeration, plan selection, and hardware accelerated execution. |
James Jie Pan; Jianguo Wang; Guoliang Li; |
| 2023 | 4 | ReAcTable: Enhancing ReAct for Table Question Answering IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Nonetheless, a conspicuous gap exists in the research landscape, where there is limited exploration of how innovative foundational research, which integrates incremental reasoning with external tools in the context of LLMs, as exemplified by the ReAct paradigm, could potentially bring advantages to the TQA task. In this paper, we aim to fill this gap, by introducing ReAcTable (ReAct for Table Question Answering tasks), a framework inspired by the ReAct paradigm that is carefully enhanced to address the challenges uniquely appearing in TQA tasks such as interpreting complex data semantics, dealing with errors generated by inconsistent data and generating intricate data transformations. |
YUNJIA ZHANG et. al. |
| 2023 | 5 | Lero: A Learning-to-Rank Query Optimizer IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce a learning-to-rank query optimizer, called Lero, which builds on top of a native query optimizer and continuously learns to improve the optimization performance. |
RONG ZHU et. al. |
| 2023 | 6 | LDPTrace: Locally Differentially Private Trajectory Synthesis IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite its potential, existing point-based perturbation mechanisms are not suitable for real-world scenarios due to poor utility, dependence on external knowledge, high computational overhead, and vulnerability to attacks. To address these limitations, we introduce LDPTrace, a novel locally differentially private trajectory synthesis framework. |
YUNTAO DU et. al. |
| 2023 | 7 | The FormAI Dataset: Generative AI in Software Security Through The Lens of Formal Verification IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. |
NORBERT TIHANYI et. al. |
| 2023 | 8 | Vector Database Management Systems: Fundamental Concepts, Use-cases, and Current Challenges IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Vector database management systems have emerged as an important component in modern data management, driven by the growing importance for the need to computationally describe rich … |
Toni Taipalus; |
| 2023 | 9 | GPTuner: A Manual-Reading Database Tuning System Via GPT-Guided Bayesian Optimization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Hence, we propose GPTuner, a manual-reading database tuning system. |
JIALE LAO et. al. |
| 2023 | 10 | High-Throughput Vector Similarity Search in Knowledge Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). |
JASON MOHONEY et. al. |
| 2023 | 11 | ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce ScienceBenchmark, a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases. |
YI ZHANG et. al. |
| 2023 | 12 | An Empirical Evaluation of Columnar Storage Formats IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we revisit the most widely adopted open-source columnar storage formats (Parquet and ORC) with a deep dive into their internals. |
XINYU ZENG et. al. |
| 2023 | 13 | Abstractions, Scenarios, and Prompt Definitions for Process Mining with LLMs: A Case Study IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we apply LLMs in the context of process mining by i) abstracting the information of standard process mining artifacts and ii) describing the prompting strategies. |
Alessandro Berti; Daniel Schuster; Wil M. P. van der Aalst; |
| 2023 | 14 | Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we provide a holistic survey of CLQA with a detailed taxonomy studying the field from multiple angles, including graph types (modality, reasoning domain, background semantics), modeling aspects (encoder, processor, decoder), supported queries (operators, patterns, projected variables), datasets, evaluation metrics, and applications. |
Hongyu Ren; Mikhail Galkin; Michael Cochez; Zhaocheng Zhu; Jure Leskovec; |
| 2023 | 15 | CHORUS: Foundation Models for Unified Data Discovery and Exploration IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We apply foundation models to data discovery and exploration tasks. |
MOE KAYALI et. al. |
| 2023 | 16 | Querying Large Language Models with SQL IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we envision the use of SQL queries to cover a broad range of data that is not captured by traditional databases by tapping the information in LLMs. To ground this vision, we present Galois, a prototype based on a traditional database architecture, but with new physical operators for querying the underlying LLM. |
Mohammed Saeed; Nicola De Cao; Paolo Papotti; |
| 2023 | 17 | CAESURA: Language Models As Multi-Modal Query Planners IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Language-Model-Driven Query Planning, a new paradigm of query planning that uses Language Models to translate natural language queries into executable query plans. |
Matthias Urban; Carsten Binnig; |
| 2023 | 18 | GuP: Fast Subgraph Matching By Guard-based Pruning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose GuP, a subgraph matching algorithm with pruning based on guards. |
Junya Arai; Yasuhiro Fujiwara; Makoto Onizuka; |
| 2023 | 19 | From BERT to GPT-3 Codex: Harnessing The Potential of Very Large Language Models for Data Management IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The goal of the tutorial is to introduce database researchers to the latest generation of language models, and to their use cases in the domain of data management. |
Immanuel Trummer; |
| 2023 | 20 | DB-GPT: Empowering Database Interactions with Private Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present DB-GPT, a revolutionary and production-ready project that integrates LLMs with traditional database systems to enhance user experience and accessibility. |
SIQIAO XUE et. al. |
| 2023 | 21 | D-Bot: Database Diagnosis System Using Large Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Thus, we propose D-Bot, an LLM-based database diagnosis system that can automatically acquire knowledge from diagnosis documents, and generate reasonable and well-founded diagnosis report (i.e., identifying the root causes and solutions) within acceptable time (e.g., under 10 minutes compared to hours by a DBA). |
XUANHE ZHOU et. al. |
| 2023 | 22 | REIN: A Comprehensive Benchmark Framework for Data Cleaning Methods in ML Pipelines IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce a comprehensive benchmark, called REIN1, to thoroughly investigate the impact of data cleaning methods on various ML models. |
Mohamed Abdelaal; Christian Hammacher; Harald Schoening; |
| 2023 | 23 | Free Join: Unifying Worst-Case Optimal and Traditional Joins IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we propose a new framework, called Free Join, that unifies the two paradigms. |
Yisu Remy Wang; Max Willsey; Dan Suciu; |
| 2023 | 24 | Trajectory Data Collection with Local Differential Privacy IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches to private trajectory data collection in a local setting typically use relaxed versions of LDP, which cannot provide a strict privacy guarantee, or require some external knowledge that is impractical to obtain and update in a timely manner. To tackle these problems, we propose a novel trajectory perturbation mechanism that relies solely on an underlying location set and satisfies pure $\epsilon$-LDP to provide a stringent privacy guarantee. |
Yuemin Zhang; Qingqing Ye; Rui Chen; Haibo Hu; Qilong Han; |
| 2023 | 25 | A Comprehensive Review of Visualization Methods for Association Rule Mining: Taxonomy, Challenges, Open Problems and Future Ideas IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Defining the future steps of this research area is another goal of this review paper. |
Iztok Fister Jr.; Iztok Fister; Dušan Fister; Vili Podgorelec; Sancho Salcedo-Sanz; |
| 2023 | 26 | City Foundation Models for Learning General Purpose Representations from OpenStreetMap IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area of interest, such as a city. |
Pasquale Balsebre; Weiming Huang; Gao Cong; Yi Li; |
| 2023 | 27 | Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Not being familiar with either can create obstacles that make the process time-consuming and overwhelming for data analysts. To address this issue, we introduce InsightPilot, an LLM (Large Language Model)-based, automated data exploration system designed to simplify the data exploration process. |
Pingchuan Ma; Rui Ding; Shuai Wang; Shi Han; Dongmei Zhang; |
| 2023 | 28 | Learning to Optimize LSM-trees: Towards A Reinforcement Learning Based Key-Value Store for Dynamic Workloads IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under the context of dynamic workloads; (2) RusKey is the first study to use Reinforcement Learning (RL) to guide LSM-tree transformations; (3) RusKey includes a new LSM-tree design, named FLSM-tree, for an efficient transition between different compaction policies — the bottleneck of dynamic key-value stores. |
Dingheng Mo; Fanchao Chen; Siqiang Luo; Caihua Shan; |
| 2023 | 29 | Generations of Knowledge Graphs: The Crazy Ideas and The Business Impact IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we describe three generations of knowledge graphs: entity-based KGs, which have been supporting general search and question answering (e.g., at Google and Bing); text-rich KGs, which have been supporting search and recommendations for products, bio-informatics, etc. (e.g., at Amazon and Alibaba); and the emerging integration of KGs and LLMs, which we call dual neural KGs. |
Xin Luna Dong; |
| 2023 | 30 | PrivLava: Synthesizing Relational Data with Foreign Keys Under Differential Privacy IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Therefore, multi-relational data synthesis with strong privacy guarantees is an open problem. In this paper, we address the above open problem by proposing PrivLava, the first solution for synthesizing relational data with foreign keys under differential privacy, a rigorous privacy framework widely adopted in both academia and industry. |
Kuntai Cai; Xiaokui Xiao; Graham Cormode; |
| 2022 | 1 | PG-Schema: Schemas for Property Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. |
RENZO ANGLES et. al. |
| 2022 | 2 | Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose Starmie, an end-to-end framework for dataset discovery from data lakes (with table union search as the main use case). |
Grace Fan; Jin Wang; Yuliang Li; Dan Zhang; Renée Miller; |
| 2022 | 3 | Representation Bias in Data: A Survey on Identification and Resolution Techniques IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: There is still a long way to fully address representation bias issues in data. The authors hope that this survey motivates researchers to approach these challenges in the future by observing existing work within their respective domains. |
Nima Shahbazi; Yin Lin; Abolfazl Asudeh; H. V. Jagadish; |
| 2022 | 4 | AIM: An Adaptive and Iterative Mechanism for Differentially Private Synthetic Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose AIM, a new algorithm for differentially private synthetic data generation. |
Ryan McKenna; Brett Mullins; Daniel Sheldon; Gerome Miklau; |
| 2022 | 5 | SANTOS: Relationship-based Semantic Table Union Search IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of union search. |
AAMOD KHATIWADA et. al. |
| 2022 | 6 | Balsa: Learning A Query Optimizer Without Expert Demonstrations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we demonstrate for the first time that learning to optimize queries without learning from an expert optimizer is both possible and efficient. |
ZONGHENG YANG et. al. |
| 2022 | 7 | Biolink Model: A Universal Schema for Knowledge Graphs in Clinical, Biomedical, and Translational Science IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. |
DEEPAK R. UNNI et. al. |
| 2022 | 8 | Contrastive Trajectory Similarity Learning with Dual-Feature Attention IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: An ideal measure should have the capability to accurately evaluate the similarity between any two trajectories in a very short amount of time. Towards this aim, we propose a contrastive learning-based trajectory modeling method named TrajCL. |
Yanchuan Chang; Jianzhong Qi; Yuxuan Liang; Egemen Tanin; |
| 2022 | 9 | The Effects of Data Quality on Machine Learning Performance on Tabular Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We explore empirically the relationship between six data quality dimensions and the performance of 19 popular machine learning algorithms covering the tasks of classification, regression, and clustering, with the goal of explaining their performance in terms of data quality. |
SEDIR MOHAMMED et. al. |
| 2022 | 10 | Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce zero-shot cost models which enable learned cost estimation that generalizes to unseen databases. |
Benjamin Hilprecht; Carsten Binnig; |
| 2022 | 11 | LDP-IDS: Local Differential Privacy for Infinite Data Streams IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing few LDP studies over streams are either applicable to finite streams only or suffering from insufficient protection. This paper investigates this problem by proposing LDP-IDS, a novel $w$-event LDP paradigm to provide practical privacy guarantee for infinite streams at users end, and adapting the popular budget division framework in centralized differential privacy (CDP). |
XUEBIN REN et. al. |
| 2022 | 12 | Towards Dynamic and Safe Configuration Tuning for Cloud Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To fill these gaps, we propose OnlineTune, which tunes the online databases safely in changing cloud environments. |
XINYI ZHANG et. al. |
| 2022 | 13 | Metaverse: Survey, Applications, Security, and Opportunities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this article, we make the following contributions. We first introduce the basic concepts such as the development process, definition, and characteristics of the Metaverse. |
Jiayi Sun; Wensheng Gan; Han-Chieh Chao; Philip S. Yu; |
| 2022 | 14 | FactorJoin: A New Cardinality Estimation Framework for Join Queries IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a new framework FactorJoin for estimating join queries. |
Ziniu Wu; Parimarjan Negi; Mohammad Alizadeh; Tim Kraska; Samuel Madden; |
| 2022 | 15 | LlamaTune: Sample-Efficient DBMS Configuration Tuning IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: LlamaTune employs an automated dimensionality reduction technique based on randomized projections, a biased-sampling approach to handle special values for certain knobs, and knob values bucketization, to reduce the size of the search space. |
KONSTANTINOS KANELLIS et. al. |
| 2022 | 16 | Manu: A Cloud Native Vector Database Management System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In the past three years, through interaction with our 1200+ industry users, we have sketched a vision for the features that next-generation vector databases should have, which include long-term evolvability, tunable consistency, good elasticity, and high performance. We present Manu, a cloud native vector database that implements these features. |
RENTONG GUO et. al. |
| 2022 | 17 | Are Updatable Learned Indexes Ready? IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This makes practitioners still wary about how these new indexes would actually behave in practice. To fill this gap, this paper conducts the first comprehensive evaluation on updatable learned indexes. |
CHAICHON WONGKHAM et. al. |
| 2022 | 18 | Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose Sudowoodo, a multi-purpose DI&P framework based on contrastive representation learning. |
Runhui Wang; Yuliang Li; Jin Wang; |
| 2022 | 19 | End-to-end Optimization of Machine Learning Prediction Queries IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Raven, a production-ready system for optimizing prediction queries. |
KWANGHYUN PARK et. al. |
| 2022 | 20 | Query Processing on Tensor Computation Runtimes IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we explore how database management systems can ride the wave of innovation happening in the AI space. |
DONG HE et. al. |
| 2022 | 21 | ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, the increasing scale of KGs renders it hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. |
YUNJUN GAO et. al. |
| 2022 | 22 | Big Data Meets Metaverse: A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey, we provide a comprehensive review of how Metaverse is changing big data. |
Jiayi Sun; Wensheng Gan; Zefeng Chen; Junhui Li; Philip S. Yu; |
| 2022 | 23 | TxAllo: Dynamic Transaction Allocation in Sharded Blockchain Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In particular, we systematically formulate the transaction allocation problem and convert it to the community detection problem on a graph. |
Yuanzhe Zhang; Shirui Pan; Jiangshan Yu; |
| 2022 | 24 | The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a list of challenges and opportunities that can inspire next steps in system design making the case for DSM-DB. |
Ruihong Wang; Jianguo Wang; Stratos Idreos; M. Tamer Özsu; Walid G. Aref; |
| 2022 | 25 | Towards Blockchain-Based Secure Data Management for Remote Patient Monitoring IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Blockchain is an emerging distributed technology that can solve these issues due to its immutability and architectural nature that prevent records manipulation or alterations. In this paper, we discuss the progress and opportunities of remote patient monitoring using futuristic blockchain technologies and its two primary frameworks: Ethereum and Hyperledger Fabric. |
Md Jobair Hossain Faruk; Hossain Shahriar; Maria Valero; Sweta Sneha; Sheikh I. Ahamed Mohammad Rahman; |
| 2022 | 26 | DeepJoin: Joinable Table Discovery with Pre-trained Language Models IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Deepjoin, a deep learning model for accurate and efficient joinable table discovery. |
Yuyang Dong; Chuan Xiao; Takuma Nozawa; Masafumi Enomoto; Masafumi Oyamada; |
| 2022 | 27 | Hercules Against Data Series Similarity Search IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Hercules, a parallel tree-based technique for exact similarity search on massive disk-based data series collections. |
Karima Echihabi; Panagiota Fatourou; Kostas Zoumpatianos; Themis Palpanas; Houda Benbrahim; |
| 2022 | 28 | Dissecting BFT Consensus: In Trusted Components We Trust! IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that our FlexiTrust protocols achieve up to 185% more throughput than their Trust-BFT counterparts. |
Suyash Gupta; Sajjad Rahnama; Shubham Pandey; Natacha Crooks; Mohammad Sadoghi; |
| 2022 | 29 | HIE-SQL: History Information Enhanced Network for Context-Dependent Text-to-SQL Semantic Parsing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In view of the mismatch, we treat natural language and SQL as two modalities and propose a bimodal pre-trained model to bridge the gap between them. |
Yanzhao Zheng; Haibin Wang; Baohua Dong; Xingjun Wang; Changshan Li; |
| 2022 | 30 | PIM-tree: A Skew-resistant Index for Processing-in-Memory IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper presents PIM-tree, an ordered index for PIM systems that achieves both low communication and high load balance, regardless of the degree of skew in the data and the queries. |
HONGBO KANG et. al. |
| 2021 | 1 | On Data Lake Architectures and Metadata Management IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we provide in this paper a comprehensive state of the art of the different approaches to data lake design. |
Pegdwendé Sawadogo; Jérôme Darmont; |
| 2021 | 2 | Fairness in Rankings and Recommendations: An Overview IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we aim at presenting a toolkit of definitions, models and methods used for ensuring fairness in rankings and recommendations. |
Evaggelia Pitoura; Kostas Stefanidis; Georgia Koutrika; |
| 2021 | 3 | Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. |
YUXING HAN et. al. |
| 2021 | 4 | Graph Pattern Matching in GQL and SQL/PGQ IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper, written by members of WG3 and LDBC, presents the key elements of the GPML of SQL/PGQ and GQL in advance of the publication of these new standards. |
ALIN DEUTSCH et. al. |
| 2021 | 5 | Updatable Learned Index with Precise Positions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose LIPP, a brand new framework of learned index to address such issues. |
JIACHENG WU et. al. |
| 2021 | 6 | A Survey on Locality Sensitive Hashing Algorithms and Their Applications IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey paper, we provide a review of state-of-the-art LSH and Distributed LSH techniques. |
Omid Jafari; Preeti Maurya; Parth Nagarkar; Khandker Mushfiqul Islam; Chidambaram Crushev; |
| 2021 | 7 | A Survey of RDF Stores & SPARQL Engines for Querying Knowledge Graphs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. |
Waqas Ali; Muhammad Saleem; Bin Yao; Aidan Hogan; Axel-Cyrille Ngonga Ngomo; |
| 2021 | 8 | Annotating Columns with Pre-trained Language Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we study the problem of annotating table columns (i.e., predicting column types and the relationships between columns) using only information from the table itself. |
YOSHIHIKO SUHARA et. al. |
| 2021 | 9 | GitTables: A Large-Scale Corpus of Relational Tables IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The evaluation of our annotation pipeline on the T2Dv2 benchmark illustrates that our approach provides results on par with human annotations. We present three applications of GitTables, demonstrating its value for learned semantic type detection models, schema completion methods, and benchmarks for table-to-KG matching, data search, and preparation. |
Madelon Hulsebos; Çağatay Demiralp; Paul Groth; |
| 2021 | 10 | A Survey on Advancing The DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Cost-based optimizer studied in this paper is adopted in almost all current database systems. |
Hai Lan; Zhifeng Bao; Yuwei Peng; |
| 2021 | 11 | Data Management in Microservices: State of The Practice, Challenges, and Research Directions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To bridge this gap, we conducted a systematic literature review of representative articles reporting the adoption of microservices, we analyzed a set of popular open-source microservice applications, and we conducted an online survey to cross-validate the findings of the previous steps with the perceptions and experiences of over 120 experienced practitioners and researchers. |
Rodrigo Laigner; Yongluan Zhou; Marcos Antonio Vaz Salles; Yijian Liu; Marcos Kalinowski; |
| 2021 | 12 | Blockchain Transaction Processing IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: A blockchain is an append-only linked-list of blocks, which is maintained at each participating node. Each block records a set of transactions and their associated metadata. … |
Suyash Gupta; Mohammad Sadoghi; |
| 2021 | 13 | Flow-Loss: Learning Cardinality Estimates That Matter IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a new loss function, Flow-Loss, that explicitly optimizes for better query plans by approximating the optimizer’s cost model and dynamic programming search algorithm with analytical functions. To evaluate our approach, we introduce the Cardinality Estimation Benchmark, which contains the ground truth cardinalities for sub-plans of over 16K queries from 21 templates with up to 15 joins. |
PARIMARJAN NEGI et. al. |
| 2021 | 14 | A Simple Standard for Sharing Ontological Mappings (SSSOM) IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present the SSSOM standard, describe several use cases, and survey some existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable, and Reusable (FAIR). |
NICOLAS MATENTZOGLU et. al. |
| 2021 | 15 | DB-BERT: A Database Tuning Tool That Reads The Manual IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: DB-BERT applies large, pre-trained language models (specifically, the BERT model) for text analysis. |
Immanuel Trummer; |
| 2021 | 16 | Data Lakes: A Survey of Functions and Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We hope that the thorough comparison of existing solutions and the discussion of open research challenges in this survey will motivate the future development of data lake research and practice. |
Rihan Hai; Christos Koutras; Christoph Quix; Matthias Jarke; |
| 2021 | 17 | A Unified Deep Model of Learning from Both Data and Queries for Cardinality Estimation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we aim to close the gap between data-driven and query-driven methods by proposing a new unified deep autoregressive model, UAE, that learns the joint data distribution from both the data and query workload. |
Peizhi Wu; Gao Cong; |
| 2021 | 18 | Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Along the way, we introduce a specialized data model for representing and reasoning about repeatedly run components in these ML pipelines, which we call model graphlets. |
Doris Xin; Hui Miao; Aditya Parameswaran; Neoklis Polyzotis; |
| 2021 | 19 | Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, this paper provides a comprehensive evaluation of configuration tuning techniques from a broader perspective, hoping to better benefit the database community. |
XINYI ZHANG et. al. |
| 2021 | 20 | KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we fill this gap by introducing KGTorrent, a dataset of Python Jupyter notebooks with rich metadata retrieved from Kaggle, a platform hosting data science competitions for learners and practitioners with any levels of expertise. |
Luigi Quaranta; Fabio Calefato; Filippo Lanubile; |
| 2021 | 21 | Real-World Trajectory Sharing with Local Differential Privacy IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these concerns, we propose a local differentially private mechanism that is based on perturbing hierarchically-structured, overlapping $n$-grams (i.e., contiguous subsequences of length $n$) of trajectory data. |
Teddy Cunningham; Graham Cormode; Hakan Ferhatosmanoglu; Divesh Srivastava; |
| 2021 | 22 | APEX: A High-Performance Learned Index on Persistent Memory IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper proposes APEX, a new PM-optimized learned index that offers high performance, persistence, concurrency, and instant recovery. |
Baotong Lu; Jialin Ding; Eric Lo; Umar Farooq Minhas; Tianzheng Wang; |
| 2021 | 23 | Lux: Always-on Visualization Recommendations for Exploratory Dataframe Workflows IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose Lux, an always-on framework for accelerating visual insight discovery in dataframe workflows. |
DORIS JUNG-LIN LEE et. al. |
| 2021 | 24 | Farview: Disaggregated Memory with Operator Off-loading for Database Engines IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we explore disaggregation by taking it one step further and applying it to memory (DRAM). |
DARIO KOROLIJA et. al. |
| 2021 | 25 | A Unified Metamodel for NoSQL and Relational Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present the U-Schema unified metamodel able to represent logical schemas for the four most popular NoSQL paradigms (columnar, document, key-value, and graph) as well as relational schemas. |
Carlos J. Fernández Candel; Diego Sevilla Ruiz; Jesús J. García-Molina; |
| 2021 | 26 | HUGE: An Efficient and Scalable Subgraph Enumeration System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a system called HUGE to efficiently process subgraph enumeration at scale in the distributed context. |
Zhengyi Yang; Longbin Lai; Xuemin Lin; Kongzhang Hao; Wenjie Zhang; |
| 2021 | 27 | Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of The HuggingFace and GEM Data and Model Cards IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To help with the standardization of documentation, we present two case studies of efforts that aim to develop reusable documentation templates — the HuggingFace data card, a general purpose card for datasets in NLP, and the GEM benchmark data and model cards with a focus on natural language generation. |
ANGELINA MCMILLAN-MAJOR et. al. |
| 2021 | 28 | Query Driven-Graph Neural Networks for Community Search: From Non-Attributed, Attributed, to Interactive Attributed IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Graph Neural Network models for both CS and ACS problems, i.e., Query Driven-GNN and Attributed Query Driven-GNN. |
YULI JIANG et. al. |
| 2021 | 29 | Correlation Sketches for Approximate Join-Correlation Queries IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a new class of data augmentation queries: join-correlation queries. |
Aécio Santos; Aline Bessa; Fernando Chirigati; Christopher Musco; Juliana Freire; |
| 2021 | 30 | Data Quality Certification Using ISO/IEC 25012: Industrial Experiences IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present findings from the point of view of both the data quality evaluation team and the organizations that underwent the evaluation process. |
Fernando Gualo; Moisés Rodríguez; Javier Verdugo; Ismael Caballero; Mario Piattini; |
| 2020 | 1 | Deep Entity Matching With Pre-Trained Language Models IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present Ditto, a novel entity matching system based on pre-trained Transformer-based language models. |
Yuliang Li; Jinfeng Li; Yoshihiko Suhara; AnHai Doan; Wang-Chiew Tan; |
| 2020 | 2 | Domain-specific Knowledge Graphs: A Survey IF:6 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Knowledge Graphs (KGs) have made a qualitative leap and effected a real revolution in knowledge representation. This is leveraged by the underlying structure of the KG which … |
Bilal Abu-Salih; |
| 2020 | 3 | A Survey On Trajectory Data Management, Analytics, And Learning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey, we comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, such as querying spatial-only and spatial-textual trajectory data, and trajectory clustering. |
Sheng Wang; Zhifeng Bao; J. Shane Culpepper; Gao Cong; |
| 2020 | 4 | RadixSpline: A Single-Pass Learned Index IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data and is competitive with state-of-the-art learned index models, like RMI, in size and lookup performance. |
ANDREAS KIPF et. al. |
| 2020 | 5 | Privacy Preserving Distributed Machine Learning with Federated Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper addresses these issues by proposing a distributed perturbation algorithm named as DISTPAB, for privacy preservation of horizontally partitioned data. |
M. A. P. Chamikara; P. Bertok; I. Khalil; D. Liu; S. Camtepe; |
| 2020 | 6 | Benchmarking Learned Indexes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a unified benchmark that compares well-tuned implementations of three learned index structures against several state-of-the-art traditional baselines. |
RYAN MARCUS et. al. |
| 2020 | 7 | Tsunami: A Learned Multi-dimensional Index For Correlated Data And Skewed Workloads IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes, in addition to up to 11X faster query performance and 170X smaller index size than optimally-tuned traditional indexes. |
Jialin Ding; Vikram Nathan; Mohammad Alizadeh; Tim Kraska; |
| 2020 | 8 | Dataset Discovery In Data Lakes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We refer to this as the problem of dataset discovery in data lakes and this paper contributes an effective and efficient solution to it. |
Alex Bogatu; Alvaro A. A. Fernandes; Norman W. Paton; Nikolaos Konstantinou; |
| 2020 | 9 | Are We Ready For Learned Cardinality Estimation? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we ask a forward-thinking question: Are we ready to deploy these learned cardinality models in production? |
Xiaoying Wang; Changbo Qu; Weiyuan Wu; Jiannan Wang; Qingqing Zhou; |
| 2020 | 10 | Testing Database Engines Via Pivoted Query Synthesis IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we devised a novel and general approach that we have termed Pivoted Query Synthesis. |
Manuel Rigger; Zhendong Su; |
| 2020 | 11 | Neural Networks for Entity Matching: A Survey IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey, we present how neural networks have been used for entity matching. |
Nils Barlaug; Jon Atle Gulla; |
| 2020 | 12 | FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in model size and accurate in estimation quality. |
RONG ZHU et. al. |
| 2020 | 13 | Qd-tree: Learning Data Layouts For Big Data Analytics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a new framework called a query-data routing tree, or qd-tree, to address this problem, and propose two algorithms for their construction based on greedy and deep reinforcement learning techniques. |
ZONGHENG YANG et. al. |
| 2020 | 14 | On The Nature and Types of Anomalies: A Review of Deviations in Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies and presents a full overview of anomaly types and subtypes. |
Ralph Foorthuis; |
| 2020 | 15 | Towards Scalable Dataframe Systems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we lay out a vision and roadmap for scalable dataframe systems. |
DEVIN PETERSOHN et. al. |
| 2020 | 16 | Valentine: Evaluating Matching Techniques for Dataset Discovery IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. |
CHRISTOS KOUTRAS et. al. |
| 2020 | 17 | The LDBC Social Network Benchmark IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The Linked Data Benchmark Council’s Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management. |
RENZO ANGLES et. al. |
| 2020 | 18 | Cost Models For Big Data Query Processing: Learning, Retrofitting, And Our Findings IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate two key questions: (i) can we learn accurate cost models for big data systems, and (ii) can we integrate the learned models within the query optimizer. |
Tarique Siddiqui; Alekh Jindal; Shi Qiao; Hiren Patel; Wangchao le; |
| 2020 | 19 | Return Of The Lernaean Hydra: Experimental Evaluation Of Data Series Approximate Similarity Search IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a taxonomy of similarity search techniques that reconciles the terminology used in these two domains, we describe modifications to data series indexing techniques enabling them to answer approximate similarity queries with quality guarantees, and we conduct a thorough experimental evaluation to compare approximate similarity search techniques under a unified framework, on synthetic and real datasets in memory and on disk. |
Karima Echihabi; Kostas Zoumpatianos; Themis Palpanas; Houda Benbrahim; |
| 2020 | 20 | Discovering High Utility-Occupancy Patterns From Uncertain Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, a novel algorithm, called High-Utility-Occupancy Pattern Mining in Uncertain databases (UHUOPM), is proposed. |
Chien-Ming Chen; Lili Chen; Wensheng Gan; Lina Qiu; Weiping Ding; |
| 2020 | 21 | Efficient Bitruss Decomposition For Large-scale Bipartite Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we study the bitruss decomposition problem which aims to find all the k-bitrusses for k >= 0. |
Kai Wang; Xuemin Lin; Lu Qin; Wenjie Zhang; Ying Zhang; |
| 2020 | 22 | SDM-RDFizer: An RML Interpreter For The Efficient Creation Of RDF Knowledge Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose the SDM-RDFizer, an interpreter of the RDF Mapping Language (RML), to transform raw data in various formats into an RDF knowledge graph. |
Enrique Iglesias; Samaneh Jozashoori; David Chaves-Fraga; Diego Collarana; Maria-Esther Vidal; |
| 2020 | 23 | Multi-Dimensional Event Data in Graph Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a general data model for multi-dimensional event data based on labeled property graphs that allows storing structural and temporal relations in a single, integrated graph-based data structure in a systematic way. The queries allow for efficiently converting large real-life event data sets into our data model and we provide 5 converted data sets for further research. |
Stefan Esser; Dirk Fahland; |
| 2020 | 24 | The Lernaean Hydra Of Data Series Similarity Search: An Experimental Evaluation Of The State Of The Art IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we provide definitions for the different flavors of similarity search that have been studied in the past, and present the first systematic experimental evaluation of the efficiency of data series similarity search techniques. |
Karima Echihabi; Kostas Zoumpatianos; Themis Palpanas; Houda Benbrahim; |
| 2020 | 25 | Elle: Inferring Isolation Anomalies From Experimental Observations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Elle: a novel checker which infers an Adya-style dependency graph between client-observed transactions. |
Kyle Kingsbury; Peter Alvaro; |
| 2020 | 26 | TODS: An Automated Time Series Outlier Detection System IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present TODS, an automated Time Series Outlier Detection System for research and industrial applications. |
KWEI-HERNG LAI et. al. |
| 2020 | 27 | Efficient And Effective Community Search On Large-scale Bipartite Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study the significant (alpha, beta)-community search problem on weighted bipartite graphs. |
KAI WANG et. al. |
| 2020 | 28 | Matrix Profile Goes MAD: Variable-Length Motif And Discord Discovery In Data Series IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a new framework, which provides an exact and scalable motif and discord discovery algorithm that efficiently finds all motifs and discords in a given range of lengths. |
Michele Linardi; Yan Zhu; Themis Palpanas; Eamonn Keogh; |
| 2020 | 29 | A Comprehensive Benchmark Framework For Active Learning Methods In Entity Matching IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we build a unified active learning benchmark framework for EM that allows users to easily combine different learning algorithms with applicable example selection algorithms. |
Venkata Vamsikrishna Meduri; Lucian Popa; Prithviraj Sen; Mohamed Sarwat; |
| 2020 | 30 | Constant-Delay Enumeration For Nondeterministic Document Spanners IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Several recent works at PODS’18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. |
Antoine Amarilli; Pierre Bourhis; Stefan Mengel; Matthias Niewerth; |
| 2019 | 1 | Neo: A Learned Query Optimizer IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learning-based query optimizer that relies on deep neural networks to generate query executions plans. |
RYAN MARCUS et. al. |
| 2019 | 2 | ALEX: An Updatable Adaptive Learned Index IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. |
JIALIN DING et. al. |
| 2019 | 3 | A Survey Of Community Search Over Big Graphs IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey, we conduct a thorough review of existing community search works. |
YIXIANG FANG et. al. |
| 2019 | 4 | Approximate Queries And Representations For Large Data Sequences IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present an algorithm for realizing our technique, and the results of applying it to medical cardiology data. |
Hagit Shatkay; Stanley B. Zdonik; |
| 2019 | 5 | An End-to-End Learning-based Cost Estimator IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these challenges, we propose an effective end-to-end learning-based cost estimation framework based on a tree-structured model, which can estimate both cost and cardinality simultaneously. |
Ji Sun; Guoliang Li; |
| 2019 | 6 | Dataset Search: A Survey IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we survey the state of the art of research and commercial systems in dataset retrieval. |
ADRIANE CHAPMAN et. al. |
| 2019 | 7 | Learning Multi-dimensional Indexes IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Flood, a multi-dimensional in-memory index that automatically adapts itself to a particular dataset and workload by jointly optimizing the index structure and data storage. |
Vikram Nathan; Jialin Ding; Mohammad Alizadeh; Tim Kraska; |
| 2019 | 8 | Deep Unsupervised Cardinality Estimation IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep autoregressive models. |
ZONGHENG YANG et. al. |
| 2019 | 9 | SharPer: Sharding Permissioned Blockchains Over Network Clusters IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce SharPer, a permissioned blockchain system that improves scalability by clustering (partitioning) the nodes and assigning different data shards to different clusters where each data shard is replicated on the nodes of a cluster. |
Mohammad Javad Amiri; Divyakant Agrawal; Amr El Abbadi; |
| 2019 | 10 | A Comparative Survey Of Recent Natural Language Interfaces For Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we give an overview over 24 recently developed NLIs for databases. |
Katrin Affolter; Kurt Stockinger; Abraham Bernstein; |
| 2019 | 11 | Plan-Structured Deep Neural Network Models For Query Performance Prediction IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we argue that deep learning can be applied to the query performance prediction problem, and we introduce a novel neural network architecture for the task: a plan-structured neural network. |
Ryan Marcus; Olga Papaemmanouil; |
| 2019 | 12 | Optimizing Subgraph Queries By Combining Binary And Worst-Case Optimal Joins IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study the problem of optimizing subgraph queries using the new worst-case optimal join plans. |
Amine Mhedhbi; Semih Salihoglu; |
| 2019 | 13 | HoloDetect: Few-Shot Learning For Error Detection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce a few-shot learning framework for error detection. |
Alireza Heidari; Joshua McGrath; Ihab F. Ilyas; Theodoros Rekatsinas; |
| 2019 | 14 | A Hybrid Approach To Hierarchical Density-based Cluster Selection IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We show how the application of an additional threshold value can result in a combination of DBSCAN* and HDBSCAN clusters, and demonstrate potential benefits of this hybrid approach when clustering data of variable densities. |
Claudia Malzer; Marcus Baum; |
| 2019 | 15 | A Survey Of Data Quality Measurement And Monitoring Tools IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we close the gap between research into data quality measurement and practical implementations by investigating the functional scope of current data quality tools. |
Lisa Ehrlinger; Elisa Rusz; Wolfram Wöß; |
| 2019 | 16 | Database Meets Deep Learning: Challenges And Opportunities IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we discuss research problems at the intersection of the two fields. |
WEI WANG et. al. |
| 2019 | 17 | Low-resource Deep Entity Resolution With Transfer And Active Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we develop a deep learning-based method that targets low-resource settings for ER through a novel combination of transfer learning and active learning. |
Jungo Kasai; Kun Qian; Sairam Gurajada; Yunyao Li; Lucian Popa; |
| 2019 | 18 | CleanML: A Study for Evaluating The Impact of Data Cleaning on ML Classification Tasks IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a CleanML study that systematically investigates the impact of data cleaning on ML classification tasks. |
PENG LI et. al. |
| 2019 | 19 | CityJSON: A Compact And Easy-to-use Encoding Of The CityGML Data Model IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present CityJSON, a new JSON-based exchange format for the CityGML data model (version 2.0.0). |
HUGO LEDOUX et. al. |
| 2019 | 20 | Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Due to the sheer size of such datasets, combined with the irregular nature of graph processing, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. |
MACIEJ BESTA et. al. |
| 2019 | 21 | SkinnerDB: Regret-Bounded Query Evaluation Via Reinforcement Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Along with SkinnerDB, we introduce a new quality criterion for query execution strategies. |
IMMANUEL TRUMMER et. al. |
| 2019 | 22 | Efficient Algorithms For Densest Subgraph Discovery IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Because DSD is difficult to solve, we propose a new solution paradigm in this paper. |
Yixiang Fang; Kaiqiang Yu; Reynold Cheng; Laks V. S. Lakshmanan; Xuemin Lin; |
| 2019 | 23 | Fair Decision Making Using Privacy-Protected Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose novel measures of fairness in the context of randomized differentially private algorithms and identify a range of causes of outcome disparities. |
SATYA KUPPAM et. al. |
| 2019 | 24 | ZeroER: Entity Resolution Using Zero Labeled Examples IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we answer in the affirmative through our proposed approach dubbed ZeroER. |
Renzhi Wu; Sanya Chaba; Saurabh Sawlani; Xu Chu; Saravanan Thirumuruganathan; |
| 2019 | 25 | SOSD: A Benchmark For Learned Indexes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To answer this question, we propose a new benchmarking framework that comes with a variety of real-world datasets and baseline implementations to compare against. |
ANDREAS KIPF et. al. |
| 2019 | 26 | Atomic Commitment Across Blockchains IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present AC3WN, the first decentralized all-or-nothing atomic cross-chain commitment protocol. |
Victor Zakhary; Divyakant Agrawal; Amr El Abbadi; |
| 2019 | 27 | ProUM: Projection-based Utility Mining On Sequence Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose an efficient Projection-based Utility Mining (ProUM) approach to discover high-utility sequential patterns from sequence data. |
WENSHENG GAN et. al. |
| 2019 | 28 | A Survey On Map-Matching Algorithms IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a new categorisation of the solutions according to their map-matching models and working scenarios. |
Pingfu Chao; Yehong Xu; Wen Hua; Xiaofang Zhou; |
| 2019 | 29 | RCC: Resilient Concurrent Consensus For High-Throughput Secure Transaction Processing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To push throughput beyond this single-replica limit, we propose concurrent consensus. |
Suyash Gupta; Jelle Hellings; Mohammad Sadoghi; |
| 2019 | 30 | TigerGraph: A Native MPP Graph Database IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present TigerGraph, a graph database system built from the ground up to support massively parallel computation of queries and analytics. |
Alin Deutsch; Yu Xu; Mingxi Wu; Victor Lee; |
| 2018 | 1 | Datasheets for Datasets IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address this gap, we propose datasheets for datasets. |
TIMNIT GEBRU et. al. |
| 2018 | 2 | Data Synthesis Based On Generative Adversarial Networks IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a method that meets both requirements. |
NOSEONG PARK et. al. |
| 2018 | 3 | Learned Cardinalities: Estimating Correlated Joins With Deep Learning IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We describe a new deep learning approach to cardinality estimation. |
ANDREAS KIPF et. al. |
| 2018 | 4 | The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We discuss ways to move forward given the limitations identified. |
Sarah Holland; Ahmed Hosny; Sarah Newman; Joshua Joseph; Kasia Chmielinski; |
| 2018 | 5 | Focus: Querying Large Video Datasets With Low Latency And Low Cost IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We build Focus, a system for low-latency and low-cost querying on large video datasets. |
KEVIN HSIEH et. al. |
| 2018 | 6 | A Survey Of Utility-Oriented Pattern Mining IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For identifying and evaluating the usefulness of different kinds of patterns, many techniques and constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g., weight, price, profit, quantity, satisfaction, etc.). |
WENSHENG GAN et. al. |
| 2018 | 7 | Deep Reinforcement Learning For Join Order Enumeration IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we argue that existing deep reinforcement learning techniques can be applied to address this challenge. |
Ryan Marcus; Olga Papaemmanouil; |
| 2018 | 8 | A Survey Of Parallel Sequential Pattern Mining IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, an in-depth survey of the current status of parallel sequential pattern mining (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state of the art parallel SPM. |
Wensheng Gan; Jerry Chun-Wei Lin; Philippe Fournier-Viger; Han-Chieh Chao; Philip S. Yu; |
| 2018 | 9 | FITing-Tree: A Data-aware Index Structure IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present FITing-Tree, a novel form of a learned index which uses piece-wise linear functions with a bounded error specified at construction time. |
Alex Galakatos; Michael Markovitch; Carsten Binnig; Rodrigo Fonseca; Tim Kraska; |
| 2018 | 10 | Learning To Optimize Join Queries With Deep Reinforcement Learning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recognizing the link between classical Dynamic Programming enumeration methods and recent results in Reinforcement Learning (RL), we propose a new method for learning optimized join search strategies. |
Sanjay Krishnan; Zongheng Yang; Ken Goldberg; Joseph Hellerstein; Ion Stoica; |
| 2018 | 11 | LSM-based Storage Techniques: A Survey IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we provide a survey of recent research efforts on LSM-trees so that readers can learn the state-of-the-art in LSM-based storage techniques. |
Chen Luo; Michael J. Carey; |
| 2018 | 12 | Apache Calcite: A Foundational Framework For Optimized Query Processing Over Heterogeneous Data Sources IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Abstract: Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems … |
Edmon Begoli; Jesús Camacho Rodríguez; Julian Hyde; Michael J. Mior; Daniel Lemire; |
| 2018 | 13 | VChain: Enabling Verifiable Boolean Range Queries Over Blockchain Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we take the first step toward investigating the problem of verifiable query processing over blockchain databases. |
Cheng Xu; Ce Zhang; Jianliang Xu; |
| 2018 | 14 | Benchmarking Distributed Stream Data Processing Systems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a framework for benchmarking distributed stream processing engines. |
JEYHUN KARIMOV et. al. |
| 2018 | 15 | Learning State Representations For Query Optimization With Deep Reinforcement Learning IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our goal in this work is to explore the capabilities of deep reinforcement learning in the context of query optimization. |
Jennifer Ortiz; Magdalena Balazinska; Johannes Gehrke; S. Sathiya Keerthi; |
| 2018 | 16 | BlazeIt: Optimizing Declarative Aggregation And Limit Queries For Neural Network-Based Video Analytics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce two new query optimization techniques in BlazeIt that are not supported by prior work. |
Daniel Kang; Peter Bailis; Matei Zaharia; |
| 2018 | 17 | Model-based Pricing For Machine Learning In A Data Marketplace IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a model-based pricing (MBP) framework, which instead of pricing the data, directly prices ML model instances. |
Lingjiao Chen; Paraschos Koutris; Arun Kumar; |
| 2018 | 18 | The Vadalog System: Datalog-based Reasoning For Knowledge Graphs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we present the Vadalog system, a Datalog-based system for performing complex logic reasoning tasks, such as those required in advanced knowledge graphs. |
Luigi Bellomarini; Georg Gottlob; Emanuel Sallinger; |
| 2018 | 19 | VerdictDB: Universalizing Approximate Query Processing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Therefore, we argue that a universal solution is needed: a database-agnostic approximation engine that will widen the reach of this emerging technology across various platforms. |
Yongjoo Park; Barzan Mozafari; Joseph Sorenson; Junhao Wang; |
| 2018 | 20 | Accelerating Human-in-the-loop Machine Learning: Challenges And Opportunities IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We describe our vision for a human-in-the-loop ML system that accelerates this process: by intelligently tracking changes and intermediate results over time, such a system can enable rapid iteration, quick responsive feedback, introspection and debugging, and background execution and automation. |
DORIS XIN et. al. |
| 2018 | 21 | Rafiki: Machine Learning As An Analytics Service System IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we develop and present a system, called Rafiki, to provide the training and inference service of machine learning models, and facilitate complex analytics on top of cloud platforms. |
WEI WANG et. al. |
| 2018 | 22 | Answering Range Queries Under Local Differential Privacy IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce and analyze methods to support range queries under the local variant of differential privacy, an emerging standard for privacy-preserving data analysis. |
Tejas Kulkarni; Graham Cormode; Divesh Srivastava; |
| 2018 | 23 | Optimizing Error Of High-dimensional Statistical Queries Under Differential Privacy IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work we propose HDMM, a new differentially private algorithm for answering a workload of predicate counting queries, that is especially effective for higher-dimensional datasets. |
Ryan McKenna; Gerome Miklau; Michael Hay; Ashwin Machanavajjhala; |
| 2018 | 24 | Achieving Data Truthfulness And Privacy Preservation In Data Markets IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose TPDM, which efficiently integrates data Truthfulness and Privacy preservation in Data Markets. |
Chaoyue Niu; Zhenzhe Zheng; Fan Wu; Xiaofeng Gao; Guihai Chen; |
| 2018 | 25 | TaxoGen: Unsupervised Topic Taxonomy Construction By Adaptive Term Embedding And Clustering IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. |
CHAO ZHANG et. al. |
| 2018 | 26 | Assessing And Remedying Coverage For A Given Dataset IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we assess the coverage of a given dataset over multiple categorical attributes. |
Abolfazl Asudeh; Zhongjun Jin; H. V. Jagadish; |
| 2018 | 27 | ForkBase: An Efficient Storage Engine For Blockchain And Forkable Applications IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present ForkBase, a storage engine specifically designed to provide efficient support for blockchain and forkable applications. |
SHENG WANG et. al. |
| 2018 | 28 | Utility-Optimized Local Differential Privacy Mechanisms For Distribution Estimation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce the notion of ULDP (Utility-optimized LDP), which provides a privacy guarantee equivalent to LDP only for sensitive data. |
Takao Murakami; Yusuke Kawamoto; |
| 2018 | 29 | HD-Index: Pushing The Scalability-Accuracy Boundary For Approximate KNN Search In High-Dimensional Spaces IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. |
Akhil Arora; Sakshi Sinha; Piyush Kumar; Arnab Bhattacharya; |
| 2018 | 30 | Entity Resolution And Federated Learning Get A Federated Resolution IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we provide a thorough answer to this question, answering how optimal classifiers, empirical losses, margins and generalisation abilities are affected. |
RICHARD NOCK et. al. |
| 2017 | 1 | The Case For Learned Index Structures IF:9 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. |
Tim Kraska; Alex Beutel; Ed H. Chi; Jeffrey Dean; Neoklis Polyzotis; |
| 2017 | 2 | Untangling Blockchain: A Data Processing View Of Blockchain Systems IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We analyze both in-production and research systems in four dimensions: distributed ledger, cryptography, consensus protocol and smart contract. |
TIEN TUAN ANH DINH et. al. |
| 2017 | 3 | BLOCKBENCH: A Framework For Analyzing Private Blockchains IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper concerns recent private blockchain systems designed with stronger security (trust) assumption and performance requirement. |
TIEN TUAN ANH DINH et. al. |
| 2017 | 4 | HoloClean: Holistic Data Repairs With Probabilistic Inference IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce HoloClean, a framework for holistic data repairing driven by probabilistic inference. |
Theodoros Rekatsinas; Xu Chu; Ihab F. Ilyas; Christopher Ré; |
| 2017 | 5 | Size Bounds And Query Plans For Relational Joins IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study these problems from a theoretical perspective, both in the worst-case model, and in an average-case model where the database is chosen according to a known probability distribution. |
Albert Atserias; Martin Grohe; Dániel Marx; |
| 2017 | 6 | An Analytical Study Of Large SPARQL Query Logs IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date query logs from a wide variety of RDF data sources. |
Angela Bonifati; Wim Martens; Thomas Timm; |
| 2017 | 7 | G-CORE: A Core For Future Graph Query Languages IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We report on a community effort between industry and academia to shape the future of graph query languages. |
RENZO ANGLES et. al. |
| 2017 | 8 | Time Series Management Systems: A Survey IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present a thorough analysis and classification of TSMSs developed through academic or industrial research and documented through publications. |
Søren Kejser Jensen; Torben Bach Pedersen; Christian Thomsen; |
| 2017 | 9 | Designing Fair Ranking Schemes IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we develop a system that helps users choose criterion weights that lead to greater fairness. |
Abolfazl Asudeh; H. V. Jagadish; Julia Stoyanovich; Gautam Das; |
| 2017 | 10 | Enabling Smart Data: Noise Filtering In Big Data Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, two Big Data preprocessing approaches to remove noisy examples are proposed: an homogeneous ensemble and an heterogeneous ensemble filter, with special emphasis in their scalability and performance traits. |
Diego García-Gil; Julián Luengo; Salvador García; Francisco Herrera; |
| 2017 | 11 | Marginal Release Under Local Differential Privacy IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. |
Tejas Kulkarni; Graham Cormode; Divesh Srivastava; |
| 2017 | 12 | Foresight: Recommending Visual Insights IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Foresight, a system that helps the user rapidly discover visual insights from large high-dimensional datasets. |
Çağatay Demiralp; Peter J. Haas; Srinivasan Parthasarathy; Tejaswini Pedapati; |
| 2017 | 13 | The Ubiquity Of Large Graphs And Surprising Challenges Of Graph Processing: Extended Survey IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We describe the participants’ responses to our questions highlighting common patterns and challenges. |
Siddhartha Sahu; Amine Mhedhbi; Semih Salihoglu; Jimmy Lin; M. Tamer Özsu; |
| 2017 | 14 | JSON: Data Model, Query Languages And Schema Specification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: There- fore in this paper we propose a formal data model for JSON documents and, based on the common features present in available systems using JSON, we define a lightweight query language allowing us to navigate through JSON documents. |
Pierre Bourhis; Juan L. Reutter; Fernando Suárez; Domagoj Vrgoč; |
| 2017 | 15 | Big Data: Challenges, Opportunities And Realities IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This chapter presents an overview of big data analytics, its application, advantages, and limitations. |
Abhay Bhadani; Dhanya Jothimani; |
| 2017 | 16 | BoostClean: Automated Error Detection And Repair For Machine Learning IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present BoostClean which automatically selects an ensemble of error detection and repair combinations using statistical boosting. |
Sanjay Krishnan; Michael J. Franklin; Ken Goldberg; Eugene Wu; |
| 2017 | 17 | Quantifying Differential Privacy In Continuous Data Release Under Temporal Correlations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate the potential privacy loss of a traditional DP mechanism under temporal correlations. Third, we propose data releasing mechanisms that convert any existing DP mechanism into one against TPL. |
Yang Cao; Masatoshi Yoshikawa; Yonghui Xiao; Li Xiong; |
| 2017 | 18 | Answering Conjunctive Queries Under Updates IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We consider the task of enumerating and counting answers to $k$-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. |
Christoph Berkholz; Jens Keppeler; Nicole Schweikardt; |
| 2017 | 19 | Big Data Systems Meet Machine Learning Challenges: Towards Big Data Science As A Service IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recently, we have been witnessing huge advancements in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to exploit modern technologies to process, analyze and understand this data. |
Radwa Elshawi; Sherif Sakr; |
| 2017 | 20 | One Button Machine For Automating Feature Engineering In Relational Databases IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a system called One Button Machine, or OneBM for short, which automates feature discovery in relational databases. |
HOANG THANH LAM et. al. |
| 2017 | 21 | Composing Differential Privacy And Secure Computation: A Case Study On Scaling Private Record Linkage IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In light of this deficiency, we propose a novel privacy model, called output constrained differential privacy, that shares the strong privacy protection of DP, but allows for the truthful release of the output of a certain function applied to the data. |
Xi He; Ashwin Machanavajjhala; Cheryl Flynn; Divesh Srivastava; |
| 2017 | 22 | Fonduer: Knowledge Base Construction From Richly Formatted Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Fonduer, a machine-learning-based KBC system for richly formatted data. |
SEN WU et. al. |
| 2017 | 23 | A Survey Of State Management In Big Data Processing Systems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Given the pivotal role that state management plays in various use cases, in this survey, we present some of the most important uses of state as an enabler, discuss the alternative approaches used to handle and implement state, propose a taxonomy to capture the many facets of state management, and highlight new research directions. |
Quoc-Cuong To; Juan Soto; Volker Markl; |
| 2017 | 24 | Comparing Dataset Characteristics That Favor The Apriori, Eclat Or FP-Growth Frequent Itemset Mining Algorithms IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. |
Jeff Heaton; |
| 2017 | 25 | Database Learning: Toward A Database That Becomes Smarter Every Time IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We exploit the principle of maximum entropy to produce answers, which are in expectation guaranteed to be more accurate than existing sample-based approximations. |
Yongjoo Park; Ahmad Shahab Tajik; Michael Cafarella; Barzan Mozafari; |
| 2017 | 26 | Discovering More Precise Process Models From Event Logs By Filtering Out Chaotic Activities IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that the presence of such chaotic activities in an event log heavily impacts the quality of the process models that can be discovered with process discovery techniques. |
Niek Tax; Natalia Sidorova; Wil M. P. van der Aalst; |
| 2017 | 27 | Ease.ml: Towards Multi-tenant Resource Sharing For Machine Learning Workloads IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we describe the ease.ml architecture and focus on a novel technical problem introduced by ease.ml regarding resource allocation. |
Tian Li; Jie Zhong; Ji Liu; Wentao Wu; Ce Zhang; |
| 2017 | 28 | Event Stream-Based Process Discovery Using Abstract Representations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we focus on process discovery relying on online streams of business process execution events. |
Sebastiaan J. van Zelst; Boudewijn F. van Dongen; Wil M. P. van der Aalst; |
| 2017 | 29 | Computing Optimal Repairs For Functional Dependencies IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For computing an optimal S-repair, we present a polynomial-time algorithm that succeeds on certain sets of FDs and fails on others. |
Ester Livshits; Benny Kimelfeld; Sudeepa Roy; |
| 2017 | 30 | A Survey On Geographically Distributed Big-Data Processing Using MapReduce IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate and discuss challenges and requirements in designing geographically distributed data processing frameworks and protocols. |
Shlomi Dolev; Patricia Florissi; Ehud Gudes; Shantanu Sharma; Ido Singer; |
| 2016 | 1 | Foundations Of Modern Query Languages For Graph Databases IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We survey foundational features underlying modern graph query languages. |
RENZO ANGLES et. al. |
| 2016 | 2 | Measuring Fairness In Ranked Outputs IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we propose fairness measures for ranked outputs. |
Ke Yang; Julia Stoyanovich; |
| 2016 | 3 | Collecting And Analyzing Data From Smart Device Users With Local Differential Privacy IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by this, we propose Harmony, a practical, accurate and efficient system for collecting and analyzing data from smart device users, while satisfying LDP. |
THÔNG T. NGUYÊN et. al. |
| 2016 | 4 | PrivTree: A Differentially Private Algorithm For Hierarchical Decompositions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To remedy the deficiency of existing solutions, we present PrivTree, a histogram construction algorithm that also applies hierarchical decomposition but features a crucial (and somewhat surprising) improvement: when deciding whether or not to split a sub-domain, the amount of noise required in the corresponding tuple count is independent of the recursive depth. |
Jun Zhang; Xiaokui Xiao; Xing Xie; |
| 2016 | 5 | LSH Ensemble: Internet-Scale Domain Search IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a new index structure, Locality Sensitive Hashing (LSH) Ensemble, that solves the domain search problem using set containment at Internet scale. |
Erkang Zhu; Fatemeh Nargesian; Ken Q. Pu; Renée J. Miller; |
| 2016 | 6 | Building Efficient Query Engines In A High-Level Language IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this article, we realize this vision in the domain of analytical query processing. |
Amir Shaikhha; Yannis Klonatos; Christoph Koch; |
| 2016 | 7 | Effortless Data Exploration With Zenvisage: An Expressive And Interactive Visual Analytics System IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose zenvisage, a platform for effortlessly visualizing interesting patterns, trends, or insights from large datasets. |
Tarique Siddiqui; Albert Kim; John Lee; Karrie Karahalios; Aditya Parameswaran; |
| 2016 | 8 | MacroBase: Prioritizing Attention In Fast Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. |
PETER BAILIS et. al. |
| 2016 | 9 | What Do Shannon-type Inequalities, Submodular Width, And Disjunctive Datalog Have To Do With One Another? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent works on bounding the output size of a conjunctive query with functional dependencies and degree constraints have shown a deep connection between fundamental questions in information theory and database theory. |
Mahmoud Abo Khamis; Hung Q. Ngo; Dan Suciu; |
| 2016 | 10 | Predicting Completeness In Knowledge Bases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate different signals to identify the areas where a knowledge base is complete. |
Luis Galárraga; Simon Razniewski; Antoine Amarilli; Fabian M. Suchanek; |
| 2016 | 11 | Knowledge-infused And Consistent Complex Event Processing Over Real-time And Persistent Streams IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this article, we introduce a Knowledge-infused CEP (X-CEP) framework that provides domain-aware knowledge query constructs along with temporal operators that allow end-to-end queries to span across real-time and persistent streams. |
Qunzhi Zhou; Yogesh Simmhan; Viktor Prasanna; |
| 2016 | 12 | Quantifying Differential Privacy Under Temporal Correlations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate the potential privacy loss of a traditional DP mechanism under temporal correlations in the context of continuous data release. |
Yang Cao; Masatoshi Yoshikawa; Yonghui Xiao; Li Xiong; |
| 2016 | 13 | A Fast Order-Based Approach For Core Maintenance IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a new order-based approach to maintain an order, called k-order, among vertices, while a graph is updated. |
Yikai Zhang; Jeffrey Xu Yu; Ying Zhang; Lu Qin; |
| 2016 | 14 | Mining Local Process Models IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we describe a method to discover frequent behavioral patterns in event logs. |
Niek Tax; Natalia Sidorova; Reinder Haakma; Wil M. P. van der Aalst; |
| 2016 | 15 | A Survey Of RDF Data Management Systems IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we provide an overview of these works. |
M. Tamer Özsu; |
| 2016 | 16 | The BigDAWG Polystore System And Architecture IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this article, we will describe polystore databases, the current BigDAWG architecture and its application on the MIMIC II medical dataset, initial performance results and our future development plans. |
VIJAY GADEPALLY et. al. |
| 2016 | 17 | An Automatic Identification System (AIS) Database For Maritime Trajectory Prediction And Data Mining IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper is devoted to construct a standard AIS database for maritime trajectory learning, prediction and data mining. |
SHANGBO MAO et. al. |
| 2016 | 18 | Towards Linear Algebra Over Normalized Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we show that it is possible to mitigate this overhead by leveraging a popular formal algebra to represent the computations of many ML algorithms: linear algebra. |
Lingjiao Chen; Arun Kumar; Jeffrey Naughton; Jignesh M. Patel; |
| 2016 | 19 | Decision Tree Classification With Differential Privacy: A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey, we focus on one particular data mining algorithm — decision trees — and how differential privacy interacts with each of the components that constitute decision tree algorithms. |
Sam Fletcher; Md Zahidul Islam; |
| 2016 | 20 | Security And Privacy Aspects In MapReduce On Clouds: A Survey IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate and discuss security and privacy challenges and requirements, considering a variety of adversarial capabilities, and characteristics in the scope of MapReduce. |
Philip Derbeko; Shlomi Dolev; Ehud Gudes; Shantanu Sharma; |
| 2016 | 21 | Controlling False Discoveries During Interactive Data Exploration IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose solutions to integrate multiple hypothesis testing control into interactive data exploration tools. |
ZHEGUANG ZHAO et. al. |
| 2016 | 22 | Sampling-Based Query Re-Optimization IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a low-cost post-processing step that can take a plan produced by the optimizer, detect when it is likely to have made such a mistake, and take steps to fix it. |
Wentao Wu; Jeffrey F. Naughton; Harneet Singh; |
| 2016 | 23 | Data Polygamy: The Many-Many Relationships Among Urban Spatio-Temporal Data Sets IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address these challenges, we propose Data Polygamy, a scalable topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets. |
Fernando Chirigati; Harish Doraiswamy; Theodoros Damoulas; Juliana Freire; |
| 2016 | 24 | A Memory Bandwidth-Efficient Hybrid Radix Sort On GPUs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our work proposes a novel approach that almost halves the amount of memory transfers and, therefore, considerably lifts the memory bandwidth limitation. |
Elias Stehle; Hans-Arno Jacobsen; |
| 2016 | 25 | Consistently Faster And Smaller Compressed Bitmaps With Roaring IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). |
Daniel Lemire; Gregory Ssi-Yan-Kai; Owen Kaser; |
| 2016 | 26 | RECOME: A New Density-Based Clustering Algorithm Using Relative KNN Kernel Density IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose the RElative COre MErge (RECOME) clustering algorithm. |
YANGLI-AO GENG et. al. |
| 2016 | 27 | Worst-Case Optimal Algorithms For Parallel Query Processing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. |
Paul Beame; Paraschos Koutris; Dan Suciu; |
| 2016 | 28 | Effective And Complete Discovery Of Order Dependencies Via Set-based Axiomatization IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We improve significantly on complexity, offer completeness, and define a compact canonical form. |
Jaroslaw Szlichta; Parke Godfrey; Lukasz Golab; Mehdi Kargar; Divesh Srivastava; |
| 2016 | 29 | Inferring Uncertain Trajectories From Partial Observations IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we develop a technique called InferTra to infer uncertain trajectories from network-constrained partial observations. |
Prithu Banerjee; Sayan Ranu; Sriram Raghavan; |
| 2016 | 30 | DB-Nets: On The Marriage Of Colored Petri Nets And Relational Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this position paper, we focus on the foundations of the problem, arguing that contemporary approaches struggle to find a suitable equilibrium between data- and process-related aspects. |
Marco Montali; Andrey Rivkin; |
| 2015 | 1 | Converting Static Image Datasets To Spiking Neuromorphic Datasets Using Saccades IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we propose a method for converting existing Computer Vision static image datasets into Neuromorphic Vision datasets using an actuated pan-tilt camera platform. |
Garrick Orchard; Ajinkya Jayawant; Gregory Cohen; Nitish Thakor; |
| 2015 | 2 | A Survey On Truth Discovery IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey, we focus on providing a comprehensive overview of truth discovery methods, and summarizing them from different aspects. |
YALIANG LI et. al. |
| 2015 | 3 | Truth Finding On The Deep Web: Is The Problem Solved? IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study truthfulness of Deep Web data in two domains where we believed data are fairly clean and data quality is important to people’s lives: {\em Stock} and {\em Flight}. |
Xian Li; Xin Luna Dong; Kenneth Lyons; Weiyi Meng; Divesh Srivastava; |
| 2015 | 4 | Big Data Analytics For Dynamic Energy Management In Smart Grids IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This research aims to highlight the big data issues and challenges faced by the DEM employed in SG networks. |
Panagiotis D. Diamantoulakis; Vasileios M. Kapinas; George K. Karagiannidis; |
| 2015 | 5 | Incremental Knowledge Base Construction Using DeepDive IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and we present techniques to make the KBC process more efficient. |
JAEHO SHIN et. al. |
| 2015 | 6 | EmptyHeaded: A Relational Engine For Graph Processing IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. |
Christopher R. Aberger; Susan Tu; Kunle Olukotun; Christopher Ré; |
| 2015 | 7 | From Data Fusion To Knowledge Fusion IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study the applicability and limitations of different fusion techniques on a more challenging problem: {\em knowledge fusion}. |
XIN LUNA DONG et. al. |
| 2015 | 8 | Knowledge-Based Trust: Estimating The Trustworthiness Of Web Sources IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. |
XIN LUNA DONG et. al. |
| 2015 | 9 | FAQ: Questions Asked Frequently IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The main technical contribution of this work is a precise characterization of when a variable ordering is ‘semantically equivalent’ to the variable ordering given by the input FAQ expression. |
Mahmoud Abo Khamis; Hung Q. Ngo; Atri Rudra; |
| 2015 | 10 | S2RDF: RDF Querying With SPARQL On Spark IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we describe a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational databases, to efficiently minimize query input size regardless of its pattern shape and diameter. |
Alexander Schätzle; Martin Przyjaciel-Zablocki; Simon Skilevic; Georg Lausen; |
| 2015 | 11 | The End Of Slow Networks: It’s Time For A Redesign IF:5 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Next generation high-performance RDMA-capable networks will require a fundamental rethinking of the design and architecture of modern distributed DBMSs. These systems are commonly … |
Carsten Binnig; Andrew Crotty; Alex Galakatos; Tim Kraska; Erfan Zamanian; |
| 2015 | 12 | Discriminative Predicate Path Mining For Fact Checking In Knowledge Graphs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We view this problem as a link-prediction task in a knowledge graph, and present a discriminative path-based method for fact checking in knowledge graphs that incorporates connectivity, type information, and predicate interactions. |
Baoxu Shi; Tim Weninger; |
| 2015 | 13 | Task Assignment On Multi-Skill Oriented Spatial Crowdsourcing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we consider a spatial crowdsourcing scenario, in which each worker has a set of qualified skills, whereas each spatial task (e.g., repairing a house, decorating a room, and performing entertainment shows for a ceremony) is time-constrained, under the budget constraint, and required a set of skills. |
Peng Cheng; Xiang Lian; Lei Chen; Jinsong Han; Jizhong Zhao; |
| 2015 | 14 | Principled Evaluation Of Differentially Private Algorithms Using DPBench IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper we propose a set of evaluation principles which we argue are essential for sound evaluation. |
Michael Hay; Ashwin Machanavajjhala; Gerome Miklau; Yan Chen; Dan Zhang; |
| 2015 | 15 | Km4City Ontology Building Vs Data Harvesting And Cleaning For Smart-city Services IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, a system for data ingestion and reconciliation of smart cities related aspects as road graph, services available on the roads, traffic sensors etc., is proposed. |
Pierfrancesco Bellini; Monica Benigni; Riccardo Billero; Paolo Nesi; Nadia Rauch; |
| 2015 | 16 | Fusing Data With Correlations IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we present novel techniques modeling correlations between sources and applying it in truth finding. |
Ravali Pochampally; Anish Das Sarma; Xin Luna Dong; Alexandra Meliou; Divesh Srivastava; |
| 2015 | 17 | Visualization-Aware Sampling For Very Large Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a visualization-aware sampling (VAS) that guarantees high quality visualizations with a small subset of the entire dataset. |
Yongjoo Park; Michael Cafarella; Barzan Mozafari; |
| 2015 | 18 | GMark: Schema-Driven Generation Of Graphs And Queries IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we present the design and engineering principles of gMark, a domain- and query language-independent graph instance and query workload generator. |
GUILLAUME BAGAN et. al. |
| 2015 | 19 | High-Speed Query Processing Over High-Speed Networks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents the blueprint for a distributed query engine that addresses these problems by considering both levels of networks holistically. |
Wolf Roediger; Tobias Muehlbauer; Alfons Kemper; Thomas Neumann; |
| 2015 | 20 | I/O Efficient Core Graph Decomposition At Web Scale IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study I/O efficient core decomposition following a semi-external model, which only allows node information to be loaded in memory. |
Dong Wen; Lu Qin; Ying Zhang; Xuemin Lin; Jeffrey Xu Yu; |
| 2015 | 21 | S-Store: Streaming Meets Transaction Processing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. |
JOHN MEEHAN et. al. |
| 2015 | 22 | A Selectivity Based Approach To Continuous Pattern Detection In Streaming Graphs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a Lazy Search algorithm where the search strategy is decided on a vertex-to-vertex basis depending on the likelihood of a match in the vertex neighborhood. |
Sutanay Choudhury; Lawrence Holder; George Chin; Khushbu Agarwal; John Feo; |
| 2015 | 23 | NXgraph: An Efficient Graph Processing System On A Single Machine IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present NXgraph, an efficient graph processing system on a single machine. |
YUZE CHI et. al. |
| 2015 | 24 | Taming Subgraph Isomorphism For RDF Query Processing IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, based on the state-of-the-art subgraph isomorphism algorithm, we propose an in-memory solution, TurboHOM++, which is tamed for the RDF processing, and we compare it with the representative RDF processing engines for several RDF benchmarks in a server machine where billions of triples can be loaded in memory. |
Jinha Kim; Hyungyu Shin; Wook-Shin Han; Sungpack Hong; Hassan Chafi; |
| 2015 | 25 | Multiple Query Optimization On The D-Wave 2X Adiabatic Quantum Computer IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we tackle the problem of multiple query optimization (MQO). |
Immanuel Trummer; Christoph Koch; |
| 2015 | 26 | Less Is More: Building Selective Anomaly Ensembles IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we tap into this gap and propose a new ensemble approach for anomaly mining, with application to event detection in temporal graphs. |
Shebuti Rayana; Leman Akoglu; |
| 2015 | 27 | Principles Of Dataset Versioning: Exploring The Recreation/Storage Tradeoff IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study this trade-off in a principled manner: we formulate six problems under various settings, trading off these quantities in various ways, demonstrate that most of the problems are intractable, and propose a suite of inexpensive heuristics drawing from techniques in delay-constrained scheduling, and spanning tree literature, to solve these problems. |
Souvik Bhattacherjee; Amit Chavan; Silu Huang; Amol Deshpande; Aditya Parameswaran; |
| 2015 | 28 | Exposing The Probabilistic Causal Structure Of Discrimination IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we take a principled causal approach to the data mining problem of discrimination detection in databases. |
Francesco Bonchi; Sara Hajian; Bud Mishra; Daniele Ramazzotti; |
| 2015 | 29 | Join Processing For Graph Patterns: An Old Dog With New Tricks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These new algorithms match or improve on those used in specialized graph-processing systems. |
DUNG NGUYEN et. al. |
| 2015 | 30 | Design Principles For Scaling Multi-core OLTP Under High Contention IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we identify two prevalent design principles that limit the multi-core scalability of many (but not all) transactional database systems on contended workloads: the multi-purpose nature of execution threads in these systems, and the lack of advanced planning of data access. |
Kun Ren; Jose M. Faleiro; Daniel J. Abadi; |
| 2014 | 1 | BigDataBench: A Big Data Benchmark Suite From Internet Services IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents our joint research efforts on this issue with several industrial partners. |
LEI WANG et. al. |
| 2014 | 2 | Protecting Locations With Differential Privacy Under Temporal Correlations IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a systematic solution to preserve location privacy with rigorous privacy guarantee. |
Yonghui Xiao; Li Xiong; |
| 2014 | 3 | Differential Privacy: An Economic Method For Choosing Epsilon IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we examine the role that these parameters play in concrete applications, identifying the key questions that must be addressed when choosing specific values. |
JUSTIN HSU et. al. |
| 2014 | 4 | AsterixDB: A Scalable, Open Source BDMS IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Covered herein are the system’s data model, its query language, and its software architecture. |
SATTAM ALSUBAIEE et. al. |
| 2014 | 5 | Leveraging Transitive Relations For Crowdsourced Joins IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we focus on the crowdsourced join query which aims to utilize humans to find all pairs of matching objects from two collections. |
Jiannan Wang; Guoliang Li; Tim Kraska; Michael J. Franklin; Jianhua Feng; |
| 2014 | 6 | DataHub: Collaborative Data Science & Dataset Version Management At Scale IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. |
ANANT BHARDWAJ et. al. |
| 2014 | 7 | An Improved Apriori Algorithm For Association Rules IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on this algorithm, this paper indicates the limitation of the original Apriori algorithm of wasting time for scanning the whole database searching on the frequent itemsets, and presents an improvement on Apriori by reducing that wasted time depending on scanning only some transactions. |
Mohammed Al-Maolegi; Bassam Arkok; |
| 2014 | 8 | Rethinking Serializable Multiversion Concurrency Control IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Bohm, a new concurrency control protocol for main-memory multi-versioned database systems. |
Jose M. Faleiro; Daniel J. Abadi; |
| 2014 | 9 | Better Bitmap Performance With Roaring Bitmaps IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Building on prior work, we introduce the Roaring compressed bitmap format: it uses packed arrays for compression instead of RLE. |
Samy Chambi; Daniel Lemire; Owen Kaser; Robert Godin; |
| 2014 | 10 | Reliable Diversity-Based Spatial Crowdsourcing By Moving Workers IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we propose three effective approximation approaches, including greedy, sampling, and divide-and-conquer algorithms. |
PENG CHENG et. al. |
| 2014 | 11 | PRESS: A Novel Framework Of Trajectory Compression In Road Networks IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we focus on the trajectory data, and propose a new framework, namely PRESS (Paralleled Road-Network-Based Trajectory Compression), to effectively compress trajectory data under road network constraints. |
Renchu Song; Weiwei Sun; Baihua Zheng; Yu Zheng; |
| 2014 | 12 | DimmWitted: A Study Of Main-Memory Statistical Analytics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Our goal is to understand tradeoffs in accessing the data in row- or column-order and at what granularity one should share the model and data for a statistical task. |
Ce Zhang; Christopher Ré; |
| 2014 | 13 | A Comparison Of Blocking Methods For Record Linkage IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We compare these approaches in terms of their recall, reduction ratio, and computational complexity. |
Rebecca C. Steorts; Samuel L. Ventura; Mauricio Sadinle; Stephen E. Fienberg; |
| 2014 | 14 | Acyclicity Notions For Existential Rules And Their Application To Query Answering In Ontologies IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present two new acyclicity notions called model-faithful acyclicity (MFA) and model-summarising acyclicity (MSA). |
BERNARDO CUENCA GRAU et. al. |
| 2014 | 15 | Scalable Density-Based Distributed Clustering IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a scalable density-based distributed clustering algorithm which allows a user-defined trade-off between clustering quality and the number of transmitted objects from the different local sites to a global server site. |
Eshref Januzaj; Hans-Peter Kriegel; Martin Pfeifle; |
| 2014 | 16 | Skew In Parallel Query Processing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study the problem of computing a conjunctive query q in parallel, using p of servers, on a large database. |
Paul Beame; Paraschos Koutris; Dan Suciu; |
| 2014 | 17 | Pregelix: Big(ger) Graph Analytics On A Dataflow Engine IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15x speedup compared to Apache Giraph and up to 35x speedup compared to distributed GraphLab), and makes more effective use of available machine resources to support Big(ger) Graph Analytics. |
Yingyi Bu; Vinayak Borkar; Jianfeng Jia; Michael J. Carey; Tyson Condie; |
| 2014 | 18 | The Missing Piece In Complex Analytics: Low Latency, Scalable Model Management And Serving With Velox IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we present Velox, a new component of the Berkeley Data Analytics Stack. |
DANIEL CRANKSHAW et. al. |
| 2014 | 19 | Rapid Sampling For Visualizations With Ordering Guarantees IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we focus on the problem of rapidly generating approximate visualizations while preserving crucial visual proper- ties of interest to analysts. |
ALBERT KIM et. al. |
| 2014 | 20 | BDGS: A Scalable Big Data Generator Suite In Big Data Benchmarking IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This gives rise to various new challenges about how we design generators efficiently and successfully. |
ZIJIAN MING et. al. |
| 2014 | 21 | Improvised Apriori Algorithm Using Frequent Pattern Tree For Real Time Applications In Data Mining IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on this algorithm, this paper indicates the limitation of the original Apriori algorithm of wasting time and space for scanning the whole database searching on the frequent itemsets, and present an improvement on Apriori. |
Akshita Bhandari; Ashutosh Gupta; Debasis Das; |
| 2014 | 22 | Evaluating The Crowd With Confidence IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we devise techniques to generate confidence intervals for worker error rate estimates, thereby enabling a better evaluation of worker quality. |
Manas Joglekar; Hector Garcia-Molina; Aditya Parameswaran; |
| 2014 | 23 | Processing SPARQL Queries Over Distributed RDF Graphs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose techniques for processing SPARQL queries over a large RDF graph in a distributed environment. |
Peng Peng; Lei Zou; M. Tamer Özsu; Lei Chen; Dongyan Zhao; |
| 2014 | 24 | Query Rewriting And Optimization For Ontological Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we discuss two important aspects of this problem: query rewriting and query optimization. |
Georg Gottlob; Giorgio Orsi; Andreas Pieris; |
| 2014 | 25 | GraphX: Unifying Data-Parallel And Graph-Parallel Analytics IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. |
REYNOLD S. XIN et. al. |
| 2014 | 26 | Metadata For Energy Disaggregation IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a metadata schema for representing appliances, meters, buildings, datasets, prior knowledge about appliances and appliance models. |
Jack Kelly; William Knottenbelt; |
| 2014 | 27 | Aber-OWL: A Framework For Ontology-based Data Access In Biology IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We have developed the Aber-OWL infrastructure that provides reasoning services for bio-ontologies. |
Robert Hoehndorf; Luke Slater; Paul N. Schofield; Georgios V. Gkoutos; |
| 2014 | 28 | Reconciliation Of RDF* And Property Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The aim of this document is to reconcile both models formally. |
Olaf Hartig; |
| 2014 | 29 | Hop Doubling Label Indexing For Point-to-Point Distance Querying On Scale-Free Networks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Given a directed or undirected graph, we propose to build an index for answering such queries based on a hop-doubling labeling technique. |
Minhao Jiang; Ada Wai-Chee Fu; Raymond Chi-Wing Wong; Yanyan Xu; |
| 2014 | 30 | The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper describes a new approach to achieving strong consistency in distributed systems while minimizing communication between nodes. |
SUDIP ROY et. al. |
| 2013 | 1 | NoSQL Database: New Era Of Databases For Big Data Analytics – Classification, Characteristics And Comparison IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This report is intended to help users, especially to the organizations to obtain an independent understanding of the strengths and weaknesses of various NoSQL database approaches to supporting applications that process huge volumes of data. |
A B M Moniruzzaman; Syed Akhter Hossain; |
| 2013 | 2 | Undefined By Data: A Survey Of Big Data Definitions IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This short paper attempts to collate the various definitions which have gained some degree of traction and to furnish a clear and concise definition of an otherwise ambiguous term. |
Jonathan Stuart Ward; Adam Barker; |
| 2013 | 3 | Communication Steps For Parallel Query Processing IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. |
Paul Beame; Paraschos Koutris; Dan Suciu; |
| 2013 | 4 | Skew Strikes Back: New Developments In The Theory Of Join Algorithms IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In spite of this study of join queries, the textbook description of join processing is suboptimal. |
Hung Q. Ngo; Christopher Re; Atri Rudra; |
| 2013 | 5 | Ontology-based Data Access: A Study Through Disjunctive Datalog, CSP, And MMSNP IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study several classes of ontology-mediated queries, where the database queries are given as some form of conjunctive query and the ontologies are formulated in description logics or other relevant fragments of first-order logic, such as the guarded fragment and the unary-negation fragment. |
Meghyn Bienvenu; Balder ten Cate; Carsten Lutz; Frank Wolter; |
| 2013 | 6 | Blowfish Privacy: Tuning Privacy-Utility Trade-offs Using Policies IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present Blowfish, a class of privacy definitions inspired by the Pufferfish framework, that provides a rich interface for this trade-off. |
Xi He; Ashwin Machanavajjhala; Bolin Ding; |
| 2013 | 7 | Querying Knowledge Graphs By Example Entity Tuples IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As an initial step toward improving the usability of knowledge graphs, we propose to query such data by example entity tuples, without requiring users to form complex graph queries. |
Nandish Jayaram; Arijit Khan; Chengkai Li; Xifeng Yan; Ramez Elmasri; |
| 2013 | 8 | Mining Frequent Graph Patterns With Differential Privacy IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we propose the first differentially private algorithm for mining frequent graph patterns. |
Entong Shen; Ting Yu; |
| 2013 | 9 | Learning And Verifying Quantified Boolean Queries By Example IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we analyze the number of questions needed to learn or verify qhorn queries, a special class of Boolean quantified queries whose underlying form is conjunctions of quantified Horn expressions. |
Azza Abouzied; Dana Angluin; Christos Papadimitriou; Joseph M. Hellerstein; Avi Silberschatz; |
| 2013 | 10 | Aggregation And Ordering In Factorised Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we extend FDB to support a larger class of practical queries with aggregates and ordering. |
Nurzhan Bakibayev; Tomáš Kočiský; Dan Olteanu; Jakub Závodný; |
| 2013 | 11 | CrowdPlanner: A Crowd-Based Route Recommendation System IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our system addresses two critical issues in its core components: a) task generation component generates a series of informative and concise questions with optimized ordering for a given candidate route set so that workers feel comfortable and easy to answer; and b) worker selection component utilizes a set of selection criteria and an efficient algorithm to find the most eligible workers to answer the questions with high accuracy. |
Han Su; |
| 2013 | 12 | The Operad Of Wiring Diagrams: Formalizing A Graphical Language For Databases, Recursion, And Plug-and-play Circuits IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that wiring diagrams form the morphisms of an operad $\mcT$, capturing this self-similarity. |
David I. Spivak; |
| 2013 | 13 | Oblivious Query Processing IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present oblivious query processing algorithms for a rich class of database queries involving selections, joins, grouping and aggregation. |
Arvind Arasu; Raghav Kaushik; |
| 2013 | 14 | Algorithm And Approaches To Handle Large Data- A Survey IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a review of various algorithms from 1994-2013 necessary for handling such large data set. |
Chanchal Yadav; Shuliang Wang; Manoj Kumar; |
| 2013 | 15 | Simple, Fast, And Scalable Reachability Oracle IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present two simple and efficient labeling algorithms, Hierarchical-Labeling and Distribution-Labeling, which can work onmassive real-world graphs: their construction time is an order of magnitude faster than the setcover based labeling approach, and transitive closure materialization is not needed. |
Ruoming Jin; Guan Wang; |
| 2013 | 16 | Managing Schema Evolution In NoSQL Data Stores IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We discuss the recommendations of the developer community on handling schema changes, and introduce a simple, declarative schema evolution language. |
Stefanie Scherzinger; Meike Klettke; Uta Störl; |
| 2013 | 17 | Probabilistic Nearest Neighbor Queries On Uncertain Moving Object Trajectories IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we fill this gap by addressing probabilistic nearest neighbor queries in databases with uncertain trajectories modeled by stochastic processes, specifically the Markov chain model. |
JOHANNES NIEDERMAYER et. al. |
| 2013 | 18 | Beyond Worst-Case Analysis For Joins With Minesweeper IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We describe a new algorithm, Minesweeper, that is able to satisfy stronger runtime guarantees than previous join algorithms (colloquially, `beyond worst-case guarantees’) for data in indexed search trees. |
Hung Q. Ngo; Dung T. Nguyen; Christopher Ré; Atri Rudra; |
| 2013 | 19 | Approximate K-nearest Neighbour Based Spatial Clustering Using K-d Tree IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, an implementation of Approximate kNN-based spatial clustering algorithm using the K-d tree is proposed. |
Dr. Mohammed Otair; |
| 2013 | 20 | Census Data Mining And Data Analysis Using WEKA IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we have made an attempt to demonstrate how one can extract the local (district) level census, socio-economic and population related other data for knowledge discovery and their analysis using the powerful data mining tool Weka. |
Sudhir B Jagtap; Kodge B. G; |
| 2013 | 21 | Parallel Triangle Counting In Massive Streaming Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Driven by these applications and the trend that modern graph datasets are both large and dynamic, we present the design and implementation of a fast and cache-efficient parallel algorithm for estimating the number of triangles in a massive undirected graph whose edges arrive as a stream. |
Kanat Tangwongsan; A. Pavan; Srikanta Tirthapura; |
| 2013 | 22 | Data Placement And Replica Selection For Improving Co-location In Distributed Environments IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we exploit the fact that most distributed environments need to use replication for fault tolerance, and we devise workload-driven replica selection and placement algorithms that attempt to minimize the average query span. |
K. Ashwin Kumar; Amol Deshpande; Samir Khuller; |
| 2013 | 23 | Privacy Preserving Social Network Publication Against Mutual Friend Attacks IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a novel privacy attack model and refer it as a mutual friend attack. |
Chongjing Sun; Philip S. Yu; Xiangnan Kong; Yan Fu; |
| 2013 | 24 | Anatomy Of The Chase IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we take closer look at recent developments, and provide additional results. |
Gosta Grahne; Adrian Onet; |
| 2013 | 25 | Transparent Data Encryption — Solution For Security Of Database Contents IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: The present study deals with Transparent Data Encryption which is a technology used to solve the problems of security of data. Transparent Data Encryption means encrypting … |
Dr. Anwar Pasha Deshmukh; Dr. Riyazuddin Qureshi; |
| 2013 | 26 | A Survey On Array Storage, Query Languages, And Systems IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey, we provide a guide for past, present, and future research in array processing. |
Florin Rusu; Yu Cheng; |
| 2013 | 27 | Want A Good Answer? Ask A Good Question First! IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study the problem of inferring the quality of questions and answers through a case study of a software CQA (Stack Overflow). |
YUAN YAO et. al. |
| 2013 | 28 | On Graph Deltas For Historical Queries IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we address the problem of evaluating historical queries on graphs. |
Georgia Koloniari; Dimitris Souravlias; Evaggelia Pitoura; |
| 2013 | 29 | Context-based Diversification For Keyword Queries Over XML Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this challenging problem, in this paper we propose an approach that automatically diversifies XML keyword search based on its different contexts in the XML data. |
Jianxin Li; Chengfei Liu; Liang Yao; Jeffrey Xu Yu; |
| 2013 | 30 | Efficient Single-Source Shortest Path And Distance Queries On Large Graphs IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the deficiency of existing work, this paper presents {\em Highways-on-Disk (HoD)}, a disk-based index that supports both SSD and SSSP queries on directed and weighted graphs. |
Andy Diwen Zhu; Xiaokui Xiao; Sibo Wang; Wenqing Lin; |
| 2012 | 1 | Distributed GraphLab: A Framework For Machine Learning In The Cloud IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we extend the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees. |
YUCHENG LOW et. al. |
| 2012 | 2 | BlinkDB: Queries With Bounded Errors And Bounded Response Times On Very Large Data IF:8 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present BlinkDB, a massively parallel, sampling-based approximate query engine for running ad-hoc, interactive SQL queries on large volumes of data. |
Sameer Agarwal; Aurojit Panda; Barzan Mozafari; Samuel Madden; Ion Stoica; |
| 2012 | 3 | Scalable K-Means++ IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work we show how to drastically reduce the number of passes needed to obtain, in parallel, a good initialization. |
Bahman Bahmani; Benjamin Moseley; Andrea Vattani; Ravi Kumar; Sergei Vassilvitskii; |
| 2012 | 4 | CrowdER: Crowdsourcing Entity Resolution IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Instead, we propose a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are used to verify only the most likely matching pairs. |
Jiannan Wang; Tim Kraska; Michael J. Franklin; Jianhua Feng; |
| 2012 | 5 | Interactive Analytical Processing In Big Data Systems: A Cross-Industry Study Of MapReduce Workloads IF:7 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our key contribution is a characterization of new MapReduce workloads which are driven in part by interactive analysis, and which make heavy use of query-like programming frameworks on top of MapReduce. |
Yanpei Chen; Sara Alspaugh; Randy Katz; |
| 2012 | 6 | Shark: SQL And Rich Analytics At Scale IF:6 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages a novel distributed memory abstraction to provide a … |
REYNOLD XIN et. al. |
| 2012 | 7 | Functional Mechanism: Regression Analysis Under Differential Privacy IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by this, we propose the Functional Mechanism, a differentially private method designed for a large class of optimization-based analyses. |
Jun Zhang; Zhenjie Zhang; Xiaokui Xiao; Yin Yang; Marianne Winslett; |
| 2012 | 8 | The MADlib Analytics Library Or MAD Skills, The SQL IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we introduce the MADlib project, including the background that led to its beginnings, and the motivation for its open source nature. |
JOE HELLERSTEIN et. al. |
| 2012 | 9 | Efficient Subgraph Matching On Billion Node Graphs IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study the problem of subgraph matching on billion-node graphs. |
Zhao Sun; Hongzhi Wang; Haixun Wang; Bin Shao; Jianzhong Li; |
| 2012 | 10 | Truss Decomposition In Massive Networks IF:6 Summary Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Abstract: The k-truss is a type of cohesive subgraphs proposed recently for the study of networks. While the problem of computing most cohesive subgraphs is NP-hard, there exists a … |
Jia Wang; James Cheng; |
| 2012 | 11 | The Vertica Analytic Database: C-Store 7 Years Later IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper describes the system architecture of the Vertica Analytic Database (Vertica), a commercialization of the design of the C-Store research prototype. |
ANDREW LAMB et. al. |
| 2012 | 12 | A Bayesian Approach To Discovering Truth From Conflicting Sources For Data Integration IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a probabilistic graphical model that can automatically infer true records and source quality without any supervision. |
Bo Zhao; Benjamin I. P. Rubinstein; Jim Gemmell; Jiawei Han; |
| 2012 | 13 | Challenging The Long Tail Recommendation IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel suite of graph-based algorithms for the long tail recommendation. |
Hongzhi Yin; Bin Cui; Jing Li; Junjie Yao; Chen Chen; |
| 2012 | 14 | CDAS: A Crowdsourcing Data Analytics System IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce the principles of our quality-sensitive model. |
XUAN LIU et. al. |
| 2012 | 15 | Efficient Processing Of K Nearest Neighbor Joins Using MapReduce IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. |
Wei Lu; Yanyan Shen; Su Chen; Beng Chin Ooi; |
| 2012 | 16 | MDCC: Multi-Data Center Consistency IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With MDCC (Multi-Data Center Consistency), we describe the first optimistic commit protocol, that does not require a master or partitioning, and is strongly consistent at a cost similar to eventually consistent protocols. |
Tim Kraska; Gene Pang; Michael J. Franklin; Samuel Madden; |
| 2012 | 17 | Solving Big Data Challenges For Enterprise Application Performance Management IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present our experience and a comprehensive performance evaluation of six modern (open-source) data stores in the context of application performance monitoring as part of CA Technologies initiative. |
TILMANN RABL et. al. |
| 2012 | 18 | The Survey Of Data Mining Applications And Feature Scope IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we have focused a variety of techniques, approaches and different areas of the research which are helpful and marked as the important field of data mining Technologies. |
Neelamadhab Padhy; Dr. Pragnyaban Mishra; Rasmita Panigrahi; |
| 2012 | 19 | Densest Subgraph In Streaming And MapReduce IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we present new algorithms for finding the densest subgraph in the streaming model. |
Bahman Bahmani; Ravi Kumar; Sergei Vassilvitskii; |
| 2012 | 20 | Massively Parallel Sort-Merge Joins In Main Memory Multi-Core Database Systems IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. |
Martina-Cezara Albutiu; Alfons Kemper; Thomas Neumann; |
| 2012 | 21 | DBToaster: Higher-order Delta Processing For Dynamic, Frequently Fresh Views IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present viewlet transforms, a recursive finite differencing technique applied to queries. |
Yanif Ahmad; Oliver Kennedy; Christoph Koch; Milos Nikolic; |
| 2012 | 22 | Don’t Thrash: How To Cache Your Hash On Flash IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents new alternatives to the well-known Bloom filter data structure. |
MICHAEL A. BENDER et. al. |
| 2012 | 23 | Using Data Mining Techniques For Diagnosis And Prognosis Of Cancer Disease IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we have discussed various data mining approaches that have been utilized for breast cancer diagnosis and prognosis. |
Shweta Kharya; |
| 2012 | 24 | Dense Subgraph Maintenance Under Streaming Edge Weight Updates For Real-time Story Identification IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on these, we propose a novel algorithm, DYNDENS, which outperforms adaptations of existing techniques to this setting, and yields meaningful results. |
Albert Angel; Nick Koudas; Nikos Sarkas; Divesh Srivastava; |
| 2012 | 25 | Probabilistically Bounded Staleness For Practical Partial Quorums IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we examine this trade-off in the context of quorum-replicated data stores. |
Peter Bailis; Shivaram Venkataraman; Michael J. Franklin; Joseph M. Hellerstein; Ion Stoica; |
| 2012 | 26 | Towards A Unified Architecture For In-RDBMS Analytics IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our main contribution in this work is to take a step towards such a unified architecture. |
Xixuan Feng; Arun Kumar; Ben Recht; Christopher Ré; |
| 2012 | 27 | PrivBasis: Frequent Itemset Mining With Differential Privacy IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study the problem of how to perform frequent itemset mining on transaction databases while satisfying differential privacy. |
Ninghui Li; Wahbeh Qardaji; Dong Su; Jianneng Cao; |
| 2012 | 28 | Verification Of Relational Data-Centric Dynamic Systems With External Services IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we study verification of (first-order) mu-calculus variants over relational data-centric dynamic systems, where data are represented by a full-fledged relational database, and the process is described in terms of atomic actions that evolve the database. |
Babak Bagheri Hariri; Diego Calvanese; Giuseppe De Giacomo; Alin Deutsch; Marco Montali; |
| 2012 | 29 | Efficient Snapshot Retrieval Over Historical Graph Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce DeltaGraph, a novel, extensible, highly tunable, and distributed hierarchical index structure that enables compactly recording the historical information, and that supports efficient retrieval of historical graph snapshots for single-site or parallel processing. |
Udayan Khurana; Amol Deshpande; |
| 2012 | 30 | Mining Frequent Itemsets Over Uncertain Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through extensive experiments, we verify that the two definitions have a tight connection and can be unified together when the size of data is large enough. |
Yongxin Tong; Lei Chen; Yurong Cheng; Philip S. Yu; |
| 2011 | 1 | A Data-Based Approach To Social Influence Maximization IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study influence maximization from a novel data-based perspective. |
Amit Goyal; Francesco Bonchi; Laks V. S. Lakshmanan; |
| 2011 | 2 | PARIS: Probabilistic Alignment Of Relations, Instances, And Schema IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we present PARIS, an approach for the automatic alignment of ontologies. |
Fabian M. Suchanek; Serge Abiteboul; Pierre Senellart; |
| 2011 | 3 | Differentially Private Spatial Decompositions IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we focus on spatial data such as locations and more generally any data that can be indexed by a tree structure. |
Graham Cormode; Magda Procopiuc; Entong Shen; Divesh Srivastava; Ting Yu; |
| 2011 | 4 | High-Performance Concurrency Control Mechanisms For Main-Memory Databases IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we introduce two efficient concurrency control methods specifically designed for main-memory databases. |
PER-ÅKE LARSON et. al. |
| 2011 | 5 | Human-powered Sorts And Joins IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. |
Adam Marcus; Eugene Wu; David Karger; Samuel Madden; Robert Miller; |
| 2011 | 6 | Tuffy: Scaling Up Statistical Inference In Markov Logic Networks Using An RDBMS IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Tuffy that achieves scalability via three novel contributions: (1) a bottom-up approach to grounding that allows us to leverage the full power of the relational optimizer, (2) a novel hybrid architecture that allows us to perform AI-style local search efficiently using an RDBMS, and (3) a theoretical insight that shows when one can (exponentially) improve the efficiency of stochastic local search. |
Feng Niu; Christopher Ré; AnHai Doan; Jude Shavlik; |
| 2011 | 7 | RTED: A Robust Algorithm For The Tree Edit Distance IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we present RTED, a robust tree edit distance algorithm. |
Mateusz Pawlik; Nikolaus Augsten; |
| 2011 | 8 | Guided Data Repair IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. |
Mohamed Yakout; Ahmed K. Elmagarmid; Jennifer Neville; Mourad Ouzzani; Ihab F. Ilyas; |
| 2011 | 9 | PASS-JOIN: A Partition-based Method For Similarity Joins IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study string similarity joins with edit-distance constraints, which find similar string pairs from two large sets of strings whose edit distance is within a given threshold. |
Guoliang Li; Dong Deng; Jiannan Wang; Jianhua Feng; |
| 2011 | 10 | Fast Updates On Read-Optimized Databases Using Multi-Core CPUs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In the second half, we present an optimized merge process reducing the merge overhead of current systems by a factor of 30. |
JENS KRUEGER et. al. |
| 2011 | 11 | Personalized Social Recommendations – Accurate Or Private? IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The main contribution of this work is in formalizing these expected trade-offs between the accuracy and privacy of personalized social recommendations. |
Ashwin Machanavajjhala; Aleksandra Korolova; Atish Das Sarma; |
| 2011 | 12 | Provenance For Aggregate Queries IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. |
Yael Amsterdamer; Daniel Deutch; Val Tannen; |
| 2011 | 13 | Automatic Optimization For MapReduce Programs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper covers Manimal, which automatically analyzes MapReduce programs and applies appropriate data- aware optimizations, thereby requiring no additional help at all from the programmer. |
Eaman Jahani; Michael J. Cafarella; Christopher Ré; |
| 2011 | 14 | Using Paxos To Build A Scalable, Consistent, And Highly Available Datastore IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper describes Spinnaker’s Paxos-based replication protocol. |
Jun Rao; Eugene J. Shekita; Sandeep Tata; |
| 2011 | 15 | Capturing Topology In Graph Pattern Matching IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: (3) We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs. |
Shuai Ma; Yang Cao; Wenfei Fan; Jinpeng Huai; Tianyu Wo; |
| 2011 | 16 | Bayesian Locality Sensitive Hashing For Fast Similarity Search IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search – performing candidate pruning and similarity estimation using LSH. |
Venu Satuluri; Srinivasan Parthasarathy; |
| 2011 | 17 | Putting Lipstick On Pig: Enabling Database-style Workflow Provenance IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. |
YAEL AMSTERDAMER et. al. |
| 2011 | 18 | Human-Assisted Graph Search: It’s Okay To Ask Questions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We consider the problem of human-assisted graph search: given a directed acyclic graph with some (unknown) target node(s), we consider the problem of finding the target node(s) by asking an omniscient human questions of the form Is there a target node that is reachable from the current node? |
Aditya Parameswaran; Anish Das Sarma; Hector Garcia-Molina; Neoklis Polyzotis; Jennifer Widom; |
| 2011 | 19 | Data Mining : A Prediction Of Performer Or Underperformer Using Classification IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, data mining techniques name Byes classification method is used on these data to help an institution. |
Umesh Kumar Pandey; Saurabh Pal; |
| 2011 | 20 | A General Framework For Representing, Reasoning And Querying With Annotated Semantic Web Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We describe a generic framework for representing and reasoning with annotated Semantic Web data, a task becoming more important with the recent increased amount of inconsistent and non-reliable meta-data on the web. |
Antoine Zimmermann; Nuno Lopes; Axel Polleres; Umberto Straccia; |
| 2011 | 21 | Automatic Wrappers For Large Scale Web Extraction IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a generic framework to make wrapper induction algorithms tolerant to noise in the training data. |
Nilesh Dalvi; Ravi Kumar; Mohamed Soliman; |
| 2011 | 22 | Column-Oriented Storage Techniques For MapReduce IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper describes how column-oriented storage techniques can be incorporated in Hadoop in a way that preserves its popular programming APIs. |
Avrilia Floratou; Jignesh Patel; Eugene Shekita; Sandeep Tata; |
| 2011 | 23 | Secure Mining Of Association Rules In Horizontally Distributed Databases IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a protocol for secure mining of association rules in horizontally distributed databases. |
Tamir Tassa; |
| 2011 | 24 | Large-Scale Collective Entity Matching IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Towards this end, we propose a principled framework to scale any generic EM algorithm. |
Vibhor Rastogi; Nilesh Dalvi; Minos Garofalakis; |
| 2011 | 25 | Analysis Of Web Logs And Web User In Web Mining IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper gives a detailed discussion about these log files, their formats, their creation, access procedures, their uses, various algorithms used and the additional parameters that can be used in the log files which in turn gives way to an effective mining. |
L. K. Joshila Grace; V. Maheswari; Dhinaharan Nagamalai; |
| 2011 | 26 | REX: Explaining Relationships Between Entity Pairs IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel problem called entity relationship explanation, which seeks to explain why a pair of entities are connected, and solve this challenging problem by integrating the above two complementary approaches, i.e., we leverage the knowledge base to explain the connections discovered between entity pairs. |
Lujun Fang; Anish Das Sarma; Cong Yu; Philip Bohannon; |
| 2011 | 27 | Query-time Entity Resolution IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We validate our approach on two large real-world publication databases where we show the usefulness of collective resolution and at the same time demonstrate the need for adaptive strategies for query processing. |
I. Bhattacharya; L. Getoor; |
| 2011 | 28 | Fast Set Intersection In Memory IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worst-case efficient way. |
Bolin Ding; Arnd Christian König; |
| 2011 | 29 | GSketch: On Query Estimation In Graph Streams IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a new graph sketch method, gSketch, which combines well studied synopses for traditional data streams with a sketch partitioning technique, to estimate and optimize the responses to basic queries on graph streams. |
Peixiang Zhao; Charu C. Aggarwal; Min Wang; |
| 2011 | 30 | Customer Data Clustering Using Data Mining Technique IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The objectives of this paper are to identify the high-profit, high-value and low-risk customers by one of the data mining technique – customer clustering. |
Dr. Sankar Rajagopal; |
| 2010 | 1 | Discovery Of Convoys In Trajectory Databases IF:6 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by this, we develop three efficient algorithms for convoy discovery that adopt the well-known filter-refinement framework. |
Hoyoung Jeung; Man Lung Yiu; Xiaofang Zhou; Christian S. Jensen; Heng Tao Shen; |
| 2010 | 2 | The Complexity Of Causality And Responsibility For Query Answers And Non-Answers IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we adapt Halpern, Pearl, and Chockler’s recent definitions of causality and responsibility to define the causes of answers and non-answers to queries, and their degree of responsibility. |
Alexandra Meliou; Wolfgang Gatterbauer; Katherine F. Moore; Dan Suciu; |
| 2010 | 3 | ElasTraS: An Elastic Transactional Data Store In The Cloud IF:5 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose ElasTraS which addresses this issue of scalability and elasticity of the data store in a cloud computing environment to leverage from the elastic nature of the underlying infrastructure, while providing scalable transactional data access. |
Sudipto Das; Divyakant Agrawal; Amr El Abbadi; |
| 2010 | 4 | Privacy In Geo-social Networks: Proximity Notification With Untrusted Service Providers And Curious Buddies IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The paper presents two new protocols providing complete privacy with respect to the SP, and controllable privacy with respect to the buddies. |
Sergio Mascetti; Dario Freni; Claudio Bettini; X. Sean Wang; Sushil Jajodia; |
| 2010 | 5 | Data Cleaning And Query Answering With Matching Dependencies And Matching Functions IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Assuming the existence of matching functions for making two attributes values equal, we formally introduce the process of cleaning an instance using matching dependencies, as a chase-like procedure. |
Leopoldo Bertossi; Solmaz Kolahi; Laks V. S. Lakshmanan; |
| 2010 | 6 | Learning Deterministic Regular Expressions For The Inference Of Schemas From XML Data IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by this observation, we provide a probabilistic algorithm that learns k-OREs for increasing values of k, and selects the deterministic one that best describes the sample based on a Minimum Description Length argument. |
Geert Jan Bex; Wouter Gelade; Frank Neven; Stijn Vansummeren; |
| 2010 | 7 | Functorial Data Migration IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we present a simple database definition language: that of categories and functors. |
David I. Spivak; |
| 2010 | 8 | Scalable Probabilistic Databases With Factor Graphs And MCMC IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over possible worlds; Markov chain Monte Carlo (MCMC) inference is then used to recover this uncertainty to a desired level of fidelity. |
Michael Wick; Andrew McCallum; Gerome Miklau; |
| 2010 | 9 | Relational Transducers For Declarative Networking IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by a recent conjecture concerning the expressiveness of declarative networking, we propose a formal computation model for eventually consistent distributed querying, based on relational transducers. |
Tom Ameloot; Frank Neven; Jan Van den Bussche; |
| 2010 | 10 | Data Stream Clustering: Challenges And Issues IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this survey, we try to clarify: first, the different problem definitions related to data stream clustering in general; second, the specific difficulties encountered in this field of research; third, the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems. |
Madjid Khalilian; Norwati Mustapha; |
| 2010 | 11 | Behavioral Simulations In MapReduce IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we present BRACE (Big Red Agent-based Computation Engine), which extends the MapReduce framework to process these simulations efficiently across a cluster. |
GUOZHANG WANG et. al. |
| 2010 | 12 | Provenance Views For Module Privacy IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The problem we address in this paper is the following: Given a workflow, abstractly modeled by a relation R, a privacy requirement \Gamma and costs associated with data. |
Susan B. Davidson; Sanjeev Khanna; Tova Milo; Debmalya Panigrahi; Sudeepa Roy; |
| 2010 | 13 | Semi-Automatic Index Tuning: Keeping DBAs In The Loop IF:4 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a new index recommendation technique, termed semi-automatic tuning, that keeps the DBA in the loop by generating recommendations that use feedback about the DBA’s preferences. |
Karl Schnaitter; Neoklis Polyzotis; |
| 2010 | 14 | Transparent Anonymization: Thwarting Adversaries Who Know The Algorithm IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Numerous generalization techniques have been proposed for privacy preserving data publishing. |
Xiaokui Xiao; Yufei Tao; Nick Koudas; |
| 2010 | 15 | Mining Frequent Itemsets Using Genetic Algorithm IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The main aim of this paper is to find all the frequent itemsets from given data sets using genetic algorithm. |
Soumadip Ghosh; Sushanta Biswas; Debasree Sarkar; Partha Pratim Sarkar; |
| 2010 | 16 | Page-Differential Logging: An Efficient And DBMS-independent Approach For Storing Data Into Flash Memory IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a new method of storing data, called page-differential logging, for flash-based storage systems that solves the drawbacks of the two methods. |
Yi-Reun Kim; Kyu-Young Whang; Il-Yeol Song; |
| 2010 | 17 | An Efficient Rigorous Approach For Identifying Statistically Significant Frequent Itemsets IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we address significance in the context of frequent itemset mining. |
ADAM KIRSCH et. al. |
| 2010 | 18 | Finding Sequential Patterns From Large Sequence Data IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we theoretically provided a brief overview three types of sequential patterns model. |
Mahdi Esmaeili; Fazekas Gabor; |
| 2010 | 19 | A Logical Temporal Relational Data Model IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a conceptual model for handling time varying attributes in the relational database model with minimal temporal attributes. |
Nadeem Mahmood; Aqil Burney; Kamran Ahsan; |
| 2010 | 20 | Automating Fine Concurrency Control In Object-Oriented Databases IF:3 Summary Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Several propositions were done to provide adapted concurrency control to object-oriented databases. However, most of these proposals miss the fact that considering solely read and … |
Carmelo Malta; José Martinez; |
| 2010 | 21 | Mining Target-Oriented Sequential Patterns With Time-Intervals IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper we present an algorithm to discover target-oriented sequential pattern with time-intervals. |
Hao-En Chueh; |
| 2010 | 22 | Faster Query Answering In Probabilistic Databases Using Read-Once Functions IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we tell a better story for a large subclass of boolean event expressions: those that are generated by conjunctive queries without self-joins and on tuple-independent probabilistic databases. |
Sudeepa Roy; Vittorio Perduca; Val Tannen; |
| 2010 | 23 | Discovering Potential User Browsing Behaviors Using Custom-built Apriori Algorithm IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We have proposed a custom-built apriori algorithm to find the effective pattern analysis. |
Sandeep Singh Rawat; Lakshmi Rajamani; |
| 2010 | 24 | Clustering High Dimensional Data Using Subspace And Projected Clustering Algorithms IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conclusions/Recommendations: In this study, we analyze in detail the properties of different data clustering method. |
Rahmat Widia Sembiring; Jasni Mohamad Zain; Abdullah Embong; |
| 2010 | 25 | Data Conflict Resolution Using Trust Mappings IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes the first principled solution to the automatic conflict resolution problem in a community database. |
Wolfgang Gatterbauer; Dan Suciu; |
| 2010 | 26 | A Framework To Model Real-time Databases IF:3 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we give an overview about different aspects of real-time databases and we clarify requirements of their modelling. |
NIZAR IDOUDI et. al. |