++++In any distributed system, you can only fully guarantee two out of the following three properties at any given time. This fundamental law is known as the CAP Theorem.
The CAP Theorem: Balancing Distributed Data โ๏ธ
This content is adapted from Mastering System Design from Basics to Cracking Interviews (Udemy). It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.
Breaking Down the Pillars
- Consistency: Every read gets the latest write. If you update your profile, anyone visiting your page immediately sees the update.
- Availability: Every request receives a response (even if it's stale data). The system "never goes down" for the user.
- Partition Tolerance: The system continues to operate even if a network failure (partition) occurs between two nodes. In modern distributed systems, Partition Tolerance is non-negotiable.
Choosing Your Trade-off
Since network partitions are unavoidable in large systems, the real choice is between CP and AP.
1. CP (Consistency + Partition Tolerance)
Prioritizes data correctness over uptime. During a network split, the system will reject requests to avoid serving inconsistent data.
- When to use: Financial systems, banking apps, atomic transactions.
- Example: HBase, MongoDB (in majority mode).
2. AP (Availability + Partition Tolerance)
Prioritizes system uptime over absolute correctness. During a split, nodes will serve whatever data they have, even if it might be stale.
- When to use: Social media feeds, product catalogs, shopping carts.
- Example: DynamoDB, Cassandra, CouchDB.
3. CA (Consistency + Availability) โ "The Unicorn"
Technically only possible in single-node systems or clusters where the network is guaranteed never to fail. In practice, this is rarely achievable in truly distributed environments.
- Example: A standalone PostgreSQL or MySQL instance.
Real-World Scenarios
The "Write" Conflict (Network Partition)
Imagine Server A and Server B lose connection.
-
Scenario AP: "Always Up": User 1 writes to Server A. User 2 reads from Server B. Since B can't talk to A, it returns the old value. The system stays up, but data is temporarily inconsistent.
-
Scenario CP: "Strict Correctness": User 1 writes to Server A. Server A tries to sync with B and fails. It returns an Error to User 1. The system is unavailable, but no one sees "wrong" data.
Interview Questions & Answers ๐ก
1. What is the CAP theorem and why is it important?
It states that a distributed system can only guarantee two out of three: Consistency, Availability, and Partition Tolerance. It guides architects in choosing trade-offs during network failures.
2. Explain the difference between CP and AP systems.
- CP (Consistency + Partition Tolerance): Prioritizes data accuracy. System rejects requests if it can't guarantee consistency. E.g., HBase.
- AP (Availability + Partition Tolerance): Prioritizes uptime. System serves stale data if node syncing fails. E.g., DynamoDB.
3. Where do SQL and NoSQL databases fit within CAP?
- SQL (Postgres/MySQL): Typically CP (prioritize consistency).
- NoSQL (DynamoDB/Couchbase): Often AP (prioritize availability).
- MongoDB: Tunable between CP and AP based on settings.
- Neo4j: CP (integrity is vital).
4. Why is CA considered rare in distributed systems?
Because network partitions are inevitable in real-world distributed environments. Without partition tolerance, a system can't be fault-tolerant across multiple servers.
Final Thoughts
Partition Tolerance is a requirement for distributed scale. CP systems are for data integrity; AP systems are for user experience. The CAP theorem forces architects to decide: Is it better to give an error or stale data?
Check out related topics: