Paper review: A Scalable Distributed Spatial Index for the Internet-of-Things

This is an article about partitioning geo-index at SOCC17 from UCB by Anand.

Main idea

The era of the Internet of things is imminent. Big data of IoT is inherently geospatial in nature and has unpredictable skews in space and time.

In order to overcome the skews and give the best performance, partition and re-balance are critical to the database and key-value store. However, frequent re-balance operation will harm insertion performance.

This paper proposes a method called sift which claims can migrate skew at ingest time. By utilizing additional dimension, sift can distribute spatial data which originally be put in one node now be put in several nodes. This is how it overcomes spatial data skew.

Future relevance

I have some doubts about this paper.

  1. MongoDB does not support spatial index partition, and this paper says in order to query a spatial index, MongoDB has to query all partition based on other index and aggregate all results together to serve one spatial query.
    Yes, MongoDB does not support spatial index sharding, however, we can use another field to reduce the range of spatial query. For example, use state, city and then go to that partition to do a spatial query.

  2. Sift does not limit the number of objects that one node can hold (only limited by its memory) and never move objects once they are inserted. I really doubt this can work, even with its hyper-dimension mechanism to migrate the data skew. Because I think if data in one place explode, the data could overflow even though they can be stored in several nodes instead of one node.

Following is an illustration of its hyper-dimension mechanism.

Actually, the main idea is that both leaf node and inter node can store object data.

0%