Disaster tolerance with Apache Cassandra

Highly Available

© Lead Image © Igor Zakharevich, 123RF.com

© Lead Image © Igor Zakharevich, 123RF.com

Article from Issue 233/2020

The size and scope of today's Internet companies require more than your average SQL. Apache Cassandra is one of the NoSQL systems filling the need for high availability at scale.

Apache Cassandra is an open source NoSQL distributed database that stores and manages large volumes of data on standard servers. Cloud providers use Cassandra for configurations with many data centers spread across global networks.

The story of Apache Cassandra began in 2007 when Facebook engineers Prashant Malik and Avinash Lakshman developed a very early version for Facebook's inbox search. The challenge was to store the data for huge datasets residing on hundreds of servers. A year later, Facebook released Cassandra on Google Code, making it an open source project. In 2009, it joined the Apache incubator, paving the way to it becoming a top-level Apache Foundation project. Since then, many well-known companies have implemented Cassandra or a commercial version (DataStax Enterprise), including Apple, Netflix, Twitter, Sony, eBay, Walmart, and FedEx. Cassandra and other NoSQL alternatives are part of a new generation of data tools designed to fulfill the massive storage needs of the Internet era. A conventional relational database, such as an SQL database, is difficult to cluster, subdivide, or scale horizontally. Companies can either keep their data at a single location and let their customers contend with long wait times to access it remotely, or they can operate two instances of the database. Neither of these scenarios is viable for a modern international company that needs both global data availability and the ability to grow without incurring additional costs. NoSQL systems are built to be extremely scalable. To increase performance, you can simply add additional nodes to the cluster on the fly. To double the performance of the database, you just need to add the same number of nodes as the cluster already has. Apache Cassandra is based on Java and has symmetrical nodes organized in clusters, rather than the master and named nodes used with SQL implementations. Cassandra is useful for real-time data storage for online applications with multiple transactions. You can also use Cassandra as a read-intensive database for business intelligence systems. If you're accustomed to SQL, you'll find that the Cassandra Query Language (CQL) is strongly reminiscent of SQL in terms of syntax and keywords. Cassandra is designed for a distributed environment. To fully implement Cassandra's disaster tolerance capabilities on a massive scale, companies need to distribute the data across different regions or even different cloud providers. If one instance fails, some latency may occur, but the data remains available.

CAP Theorem

The CAP theorem is a principle of computer science that helps to explain why NoSQL systems like Cassandra differ from conventional data tools. The CAP theorem (or Brewer's theorem), which describes the relationship between consistency (C), availability (A), and partition tolerance (P), was first articulated by Eric Allen Brewer, Professor Emeritus of Computer Science at University of California, Berkeley and Vice President of Infrastructure at Google. CAP forms the basis for planning a distributed architecture. The basic parts of the CAP decision framework are:


Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Linux News

    Samba 4.0

    • FTC ends Google investigation
    • Samba implements Windows AD
    • News Bites

    Linux Phones

    • Ubuntu launches a new phone OS
    • Samsung announces Tizen phone
    • Perl turns 25

    Big Data DB

    • vert.x project leader
    • Apache Cassandra v1.2 released
    • HPC app contest
  • Apache Announces Cassandra 2.0

    Big Data database rolls out new features and adds new powers to its query language.

  • Hadoop 2 and Apache Spark

    Hadoop version 2 has transitioned from an application to a Big Data platform. Reports of its demise are premature at best.

  • Samba for Clusters

    Samba Version 3.3 and the CTDB lock manager provide full cluster support.

  • Proxmox VE

    The Proxmox Virtual Environment has developed from an insider’s tip to a free VMware ESXi/ vSphere clone. We show you how to get started setting up a PVE high-availability cluster.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95