Dynamo: Amazon's Highly Available Key-value Store

Summary

This is the paper presented at 2007’s SOSP by Amazon.

In order to provide high availability Dynamo sacrifices consistency under certain failure scenarios ().

Amazon’s primary usecase for Dynamo is its shopping cart, where it’s really important to be highly available, or customers should be able to view and add items to shopping cart dispite of datacenter failure.

In a cluster consists of millions of components, it always have small but significant number of server or network components failing at any giving time. Dynamo should be able to treats these as normal case without impacting availability or performance.

For many service, such as shopping carts, customer preference, sales rank, they only need a primary-key to retrieve data and not require complex querying, and this is how Dynamo works. It provides primary-key only interface.

  • consistent hashing(no need to redistribution when storage nodes added or removed). Differ from traditional consistent hashing, it assigns key to multiple point in the ring.(virtual nodes)
  • Lamport algorithm for synchronization (clock sync)
  • quorum-based protocol (Replicated-write protocol)
  • gossip failure detection in which each node regularly announces to its neighbors that it is still up and running.

Dynamo provides eventual consistency. Some replica servers may fail before all updates come. So, for example, the shopping cart may have different version. Dynamo merges different version shopping cart to avoid any “add to cart” operation lost, however, some deleted items can resurface. Dynamo uses vector clock to decide whether to implement a merge operation.

#

0%