Summary
NoSql databases have fast write throughput and fast lookup on primary key. And they also have good scalability and reliability.
However, in order to search non-primary attributes, secondary index
should be maintained and this could hurts write performance
.
This paper compares two types of secondary index:
Embedded Indexes: lightweight filters embedded inside the primary table
Stand-Alone Indexes: separate data structure
by implementing 2 Embedded Indexes
and 3 Stand-Alone Indexes
on top of LevelDB.
They conclude that : the embedded indexes
offer superior write throughput
and are more space efficient, whereas the stand-alone secondary indexes
achieve faster query
response times. So the optimal choice depends on the application workload.
Details
Stand-Alone Indexes
Using tweets as example: PUT(t1, {u1, text1})
- t1: tweet id
- u1: user id
- text1: tweet content
There are three types
of Stand-Alone Indexes update:
Eager updates
: Earlier version of Cassandra
to executePUT(t4, {u1, text4})
on an Eager Index, we must retrieve the list for u1, add t4 and save it back.
Draw Back: degrades the write performance.
Lazy updates
:
simply issue a PUT(u1, {t4}) on the user id index table without retrieving the existing posting list for u1. The old postings list of u1 is merged with (u1, {t4}) later, during the periodic compaction phase.Composite
: AsterixDB and Spanner
each entry in the secondary indexes is a composite key consisting of (secondary key + primary key). The secondary lookup is a prefix search on secondary key, which can be implemented using regularrange search
on the index table.
Write and compaction can be faster thanLazy
, but the secondary attribute lookup may be slower because of range scan.
Embedded Secondary Indexes
There is no separate secondary index structure, secondary attribute information is stored inside the original (primary) data blocks. As shown in figure (b).