NoSQL Databases Part 2 - CAP Theorem

Data Engineer’s Lunch #14: NoSQL Databases Part 2 – CAP Theorem

In Data Engineer’s Lunch #14: NoSQL Databases Part 2 – Cap Theorem, we discuss the CAP theorem; as well as, ACID vs BASE. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Data Engineer’s Lunch in person, it is hosted every Monday at 12 PM EST. Register here now!

In Data Engineer’s Lunch #14: NoSQL Databases Part 2 – Cap Theorem, we cover the fundamental difference between relational vs most non-relation databases with ACID vs Base. We also discuss the CAP theorem and give examples of popular databases plus which areas of the CAP theorem they fall under. If you want a more in-depth discussion of these topics, the live recording of Data Engineer’s Lunch #14 is embedded below!

ACID vs. BASE – The fundamental difference between relational vs. most non-relational

ACID

  • Atomicity – How a unit of work is either executed or not.
  • Consistency – Each transaction only changes the state of the DB one state at a time
  • Isolation – Each transaction changes the state of the DB regardless of whether its executed in parallel or sequentially.
  • Durability – Committed transactions can withstand machine failure

BASE

  • Basically Available – Read / writes are available without a guarantee of consistency. Not certain.
  • Soft state – Probability of knowing the state at any given point on any given hardware. Not certain.
  • Eventually consistent – The system becomes eventually consistent.

CAP Theorem – CAP stands for Consistency, Availability and Partition Tolerance.

  • Consistency – Consistency means, if you write data to the distributed system, you should be able to read the same data at any point in time from any nodes of the system or simply return an error if data is in an inconsistent state. Never return inconsistent data.
  • Availability – Availability means the system should always perform reads/writes on any non-failing node of the cluster successfully without any error. This is availability is mainly associated with network partition. i.e. in the presence of network partition whether a node returns success response or an error for read/write operation.
  • Partition Tolerance – Partition Tolerance means, if there is a partition between nodes or the parts of the cluster in a distributed system are not able to talk to each other, the system should still be functioning.
CAP Theorem
CAP Theorem

CA – Consistency / Availability (MySQL, Oracle, MSSQL, Postgres)

  • Pros
    • Global Consistency
    • Immediately Available
  • Cons
    • May not be able to scale as well
    • Requires full replication

CP – Consistency / Partition Tolerance (Mongo, HBase, REDIS)

  • Pros
    • Consistent because of master node or set of coordinators
    • Partition tolerant because of how data is replicated
    • Tunable consistency / availability
  • Cons
    • Scales in some cases
    • Not immediately consistent in some use cases
    • Requires full replication

AP – Available / Partition Tolerance (Cassandra, CouchDB, DynamoDB, Cosmos, RIAK)

  • Pros
    • Infinitely scalable / Always available
    • Partition tolerance because of how data is distributed, replicated, repaired
    • Tunable consistency/ Eventually consistent
  • Cons
    • Not immediately consistent for certain use cases
    • Tradeoffs in consistency for certain use cases

If you missed last week’s Data Engineer’s Lunch #13: Introduction to Airflow, be sure to check it out! As mentioned above, the live recording of Data Engineer’s Lunch #14 is embedded below. Also, check out our YouTube page for more videos and the Data Engineer’s Lunch playlist here! Don’t forget to subscribe while you are there!

Resources

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!


Join Anant's Newsletter

Subscribe to our monthly newsletter below and never miss the latest Cassandra and data engineering news!