Disaster tolerance with Apache Cassandra
Highly Available

© Lead Image © Igor Zakharevich, 123RF.com
The size and scope of today's Internet companies require more than your average SQL. Apache Cassandra is one of the NoSQL systems filling the need for high availability at scale.
Apache Cassandra is an open source NoSQL distributed database that stores and manages large volumes of data on standard servers. Cloud providers use Cassandra for configurations with many data centers spread across global networks.
The story of Apache Cassandra began in 2007 when Facebook engineers Prashant Malik and Avinash Lakshman developed a very early version for Facebook's inbox search. The challenge was to store the data for huge datasets residing on hundreds of servers. A year later, Facebook released Cassandra on Google Code, making it an open source project. In 2009, it joined the Apache incubator, paving the way to it becoming a top-level Apache Foundation project. Since then, many well-known companies have implemented Cassandra or a commercial version (DataStax Enterprise), including Apple, Netflix, Twitter, Sony, eBay, Walmart, and FedEx. Cassandra and other NoSQL alternatives are part of a new generation of data tools designed to fulfill the massive storage needs of the Internet era. A conventional relational database, such as an SQL database, is difficult to cluster, subdivide, or scale horizontally. Companies can either keep their data at a single location and let their customers contend with long wait times to access it remotely, or they can operate two instances of the database. Neither of these scenarios is viable for a modern international company that needs both global data availability and the ability to grow without incurring additional costs. NoSQL systems are built to be extremely scalable. To increase performance, you can simply add additional nodes to the cluster on the fly. To double the performance of the database, you just need to add the same number of nodes as the cluster already has. Apache Cassandra is based on Java and has symmetrical nodes organized in clusters, rather than the master and named nodes used with SQL implementations. Cassandra is useful for real-time data storage for online applications with multiple transactions. You can also use Cassandra as a read-intensive database for business intelligence systems. If you're accustomed to SQL, you'll find that the Cassandra Query Language (CQL) is strongly reminiscent of SQL in terms of syntax and keywords. Cassandra is designed for a distributed environment. To fully implement Cassandra's disaster tolerance capabilities on a massive scale, companies need to distribute the data across different regions or even different cloud providers. If one instance fails, some latency may occur, but the data remains available.
CAP Theorem
The CAP theorem is a principle of computer science that helps to explain why NoSQL systems like Cassandra differ from conventional data tools. The CAP theorem (or Brewer's theorem), which describes the relationship between consistency (C), availability (A), and partition tolerance (P), was first articulated by Eric Allen Brewer, Professor Emeritus of Computer Science at University of California, Berkeley and Vice President of Infrastructure at Google. CAP forms the basis for planning a distributed architecture. The basic parts of the CAP decision framework are:
[...]
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
System76 Releases COSMIC Alpha 7
With scores of bug fixes and a really cool workspaces feature, COSMIC is looking to soon migrate from alpha to beta.
-
OpenMandriva Lx 6.0 Available for Installation
The latest release of OpenMandriva has arrived with a new kernel, an updated Plasma desktop, and a server edition.
-
TrueNAS 25.04 Arrives with Thousands of Changes
One of the most popular Linux-based NAS solutions has rolled out the latest edition, based on Ubuntu 25.04.
-
Fedora 42 Available with Two New Spins
The latest release from the Fedora Project includes the usual updates, a new kernel, an official KDE Plasma spin, and a new System76 spin.
-
So Long, ArcoLinux
The ArcoLinux distribution is the latest Linux distribution to shut down.
-
What Open Source Pros Look for in a Job Role
Learn what professionals in technical and non-technical roles say is most important when seeking a new position.
-
Asahi Linux Runs into Issues with M4 Support
Due to Apple Silicon changes, the Asahi Linux project is at odds with adding support for the M4 chips.
-
Plasma 6.3.4 Now Available
Although not a major release, Plasma 6.3.4 does fix some bugs and offer a subtle change for the Plasma sidebar.
-
Linux Kernel 6.15 First Release Candidate Now Available
Linux Torvalds has announced that the release candidate for the final release of the Linux 6.15 series is now available.
-
Akamai Will Host kernel.org
The organization dedicated to cloud-based solutions has agreed to host kernel.org to deliver long-term stability for the development team.