Cache-oblivious Nested-loop Joins
Abstract
We propose to adapt the newly emerged cache-oblivious model to relational query processing. Our goal is to automatically achieve an overall performance comparable to that of fine-tuned algorithms on a multi-level memory hierarchy. This automaticity is because cache-oblivious algorithms assume no knowledge about any specific parameter values, such as the capacity and block size of each level of the hierarchy. As a first step, we propose recursive partitioning to implement cache-oblivious nested-loop joins (NLJs) without indexes, and recursive clustering and buffering to implement cache-oblivious NLJs with indexes. Our theoretical results and empirical evaluation on three different architectures show that our cache-oblivious NLJs match the performance of their manually optimized, cache-conscious counterparts.