PAPER DIGEST
Most Influential SIGMOD 2016 Paper · 2026-03 edition

Simba: Efficient In-Memory Spatial Analytics

Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, Minyi Guo

Venue
ACM SIGMOD Conference (SIGMOD) 2016
Recognition
Most Influential SIGMOD 2016 Paper (Rank No. 6)
Edition
2026-03
Impact factor
6
Certificate ID
8577e2acc1454e05

Abstract

Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and high-throughput spatial queries and analytics for numerous applications in location-based services (LBS). Traditional spatial databases and spatial analytics systems are disk-based and optimized for IO efficiency. But increasingly, data are stored and processed in memory to achieve low latency, and CPU time becomes the new bottleneck. We present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba is based on Spark and runs over a cluster of commodity machines. In particular, Simba extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API. It introduces indexes over RDDs in order to work with big spatial data and complex spatial operations. Lastly, Simba implements an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput. Extensive experiments over large data sets demonstrate Simba's superior performance compared against other spatial analytics system.

Download PDF certificate