Four Kitchens
Insights

The CAP theorem is like physics to airplanes: Every database must design around it

2 Min. ReadDevelopment

Back in 2000, Eric Brewer introduced the CAP theorem, an explanation of inherent tradeoffs in distributed database design. In short: you can’t have it all. (Okay, so there’s some debate about that, but alternative theories generally introduce other caveats.)

On Twitter, I recently critiqued a presentation by Bryan Fink on the Riak system for claiming that Riak is “influenced” by CAP. This sparked a short conversation with Justin Sheehy, also from the project. 140 characters isn’t enough to explain my objection in depth, so I’m taking it here.

While I give Riak credit for having a great architecture and pushing innovation in the NoSQL (non-relational database) space, it can no more claim to be “influenced” by CAP than an airplane design can claim influence from physics. Like physics to an airplane, CAP lays out the rules for distributed databases. With that reality in mind, a distributed database designed without regard for CAP is like an airplane designed without regard for physics. So, claiming unique influence from CAP is tantamount to claiming competing systems have a dangerous disconnect with reality. Or, to carry on the analogy, it’s like Boeing making a claim that their plane designs are uniquely influenced by physics.

But we all know Airbus designs their planes with physics in mind, too, even if they pick different tradeoffs compared to Boeing. And traditional databases were influenced by CAP and its ancestors, like BASE (warning: PDF) and Bayou from Xerox PARC. CAP says “pick two.” And they did: generally C and P. This traditional — and inflexible — design of picking only one point on the CAP triangle for a database system doesn’t indicate lack of influence.

What Riak actually does is quite novel: it allows operation at more than one point on the triangle of CAP tradeoffs. This is valuable because applications value different parts of CAP for different types of data or operations on data.

For example, a banking application may value availability for viewing bank balances. Lots of transactions happen asynchronously in the real world, so a slightly outdated balance is probably better than refusing any access if there’s a net split between data centers.

In contrast, transferring from one account to another of the same person at the same bank (say, checking to savings) generally happens synchronously. A bank would rather enforce consistency above availability. If there’s a net split, they’d rather disable transfers than have one go awry or, worse, invite fraud.

A system like Riak allows making these compromises within a single system. Something like MySQL NDB, which always enforces consistency, would either unnecessarily take down balance viewing during a net split or require use of a second storage system to provide the desired account-viewing functionality.