CAP and PACELC in Practice

slug: 005-cap-pacelc number: 5 title: "CAP and PACELC in Practice" description: "CAP is the start of the conversation, not the end. PACELC is what you actually design against." youtubeId: null publishedAt: null anchor: authors: "Eric Brewer (CAP); Gilbert & Lynch (proof); Daniel Abadi (PACELC)" year: 2010 title: "Consistency Tradeoffs in Modern Distributed Database System Design" institution: "Yale University" venue: "IEEE Computer"

The pattern at a glance

Long-form article coming soon. The narration below is the spoken version of this episode — read it as a quick transcript while the written companion is in draft.

Transcript

A region goes down. Your primary database is gone. The replica in another region is healthy, but stale by two seconds.

A customer hits checkout. Two options. Return an error — the system is unavailable — until the primary comes back. Or serve a possibly-stale state from the replica, and accept that the next two seconds of writes might be wrong.

You have to pick one. There is no third option.

This is CAP.

CAP stands for Consistency, Availability, and Partition tolerance. Eric Brewer proposed it as a conjecture at PODC in 2000. Two years later, Seth Gilbert and Nancy Lynch at MIT proved it.

The theorem says: in a distributed system, when the network partitions — meaning some nodes can no longer talk to others — you cannot simultaneously guarantee both consistency and availability. One has to give. The system either refuses requests it can't safely answer, sacrificing availability, or answers requests with potentially stale data, sacrificing consistency.

Partition tolerance is not optional. Networks fail.

The most common framing of CAP is "pick two of three." This is wrong. You don't get to skip partition tolerance. Partitions are not a design choice — they happen.

The real choice is between consistency and availability during a partition. That's it. CAP is one trade-off, made under one specific condition.

Strongly-consistent systems choose consistency. During a partition, the minority side stops accepting writes. Reads to the minority side either fail or stall. The system is unavailable to some clients, but the data it does return is correct.

Highly-available systems choose availability. Every replica keeps accepting writes during a partition. The replicas diverge. After the partition heals, the system reconciles the divergent histories through conflict resolution — last-write-wins, vector clocks, or application-level merge logic.

Both choices are defensible. The wrong choice is pretending you didn't make one.

CAP only covers what happens during a partition. The rest of the time — when the network is fine — there's still a trade-off, and CAP says nothing about it. Daniel Abadi formalized this in 2010, in a paper called PACELC.

PACELC reads: if there is a Partition, choose between Availability and Consistency. Else, choose between Latency and Consistency.

The "else" half is the important part. Even with no partition, a strongly-consistent system pays in latency. Every write must reach a quorum of nodes. Every read might need to confirm the latest commit before returning. A globally-consistent database with replicas in five regions has minimum hundred-millisecond writes — by physics alone, replicating across continents.

The PACELC framing makes the latency-consistency tax visible. Most CAP-aware architects still over-promise consistency because they only count the partition-time cost.

Three traps every distributed system hits.

One: assuming partitions are rare. They are not. The major cloud providers all publish post-mortems of region-level partition events. Submarine cable cuts, BGP misconfigurations, DNS outages — partitions happen at every scale. Design as if.

Two: false confidence in consistency. "We're using a strongly-consistent database" often means strongly-consistent within a single region. Cross-region reads of the same database may be eventually consistent — meaning the read model converges over time, not instantly. The consistency model has a scope. Know what your scope is.

Three: silent eventual consistency. A system that mostly behaves consistently in normal conditions can hide its true behavior under partition. Users see correct data in tests and demos. The bug surfaces in production when the network has a bad day. Test with simulated partitions before you ship.

The right architecture is rarely the strongest possible consistency. It's the weakest consistency your business can defend.

Some workloads require consistency at any cost. Financial ledgers. Inventory counts. Distributed locks. Wrong answers here are unrecoverable.

Some workloads can absorb staleness. Product catalogs. Social feeds. Analytics dashboards. A few seconds of staleness is invisible to users.

Most real systems sit between. They choose consistency for the writes that must be exact — the transactions, the balances — and availability for everything else. Different consistency models for different operations within the same application.

CAP doesn't say one answer is right. It says you must answer.

CAP is not a permission slip to give up consistency. It's a contract that says you cannot have both consistency and availability when the network is broken.

PACELC adds: even when the network works, choosing consistency costs latency. There is no system without trade-offs. There is only the system that names them honestly.

Next episode: multi-region failover. Putting CAP and PACELC to work when an entire region goes dark.