Disaster tolerance with Apache Cassandra

Highly Available

© Lead Image © Igor Zakharevich, 123RF.com

© Lead Image © Igor Zakharevich, 123RF.com

Article from Issue 233/2020
Author(s):

The size and scope of today's Internet companies require more than your average SQL. Apache Cassandra is one of the NoSQL systems filling the need for high availability at scale.

Apache Cassandra is an open source NoSQL distributed database that stores and manages large volumes of data on standard servers. Cloud providers use Cassandra for configurations with many data centers spread across global networks.

The story of Apache Cassandra began in 2007 when Facebook engineers Prashant Malik and Avinash Lakshman developed a very early version for Facebook's inbox search. The challenge was to store the data for huge datasets residing on hundreds of servers. A year later, Facebook released Cassandra on Google Code, making it an open source project. In 2009, it joined the Apache incubator, paving the way to it becoming a top-level Apache Foundation project. Since then, many well-known companies have implemented Cassandra or a commercial version (DataStax Enterprise), including Apple, Netflix, Twitter, Sony, eBay, Walmart, and FedEx. Cassandra and other NoSQL alternatives are part of a new generation of data tools designed to fulfill the massive storage needs of the Internet era. A conventional relational database, such as an SQL database, is difficult to cluster, subdivide, or scale horizontally. Companies can either keep their data at a single location and let their customers contend with long wait times to access it remotely, or they can operate two instances of the database. Neither of these scenarios is viable for a modern international company that needs both global data availability and the ability to grow without incurring additional costs. NoSQL systems are built to be extremely scalable. To increase performance, you can simply add additional nodes to the cluster on the fly. To double the performance of the database, you just need to add the same number of nodes as the cluster already has. Apache Cassandra is based on Java and has symmetrical nodes organized in clusters, rather than the master and named nodes used with SQL implementations. Cassandra is useful for real-time data storage for online applications with multiple transactions. You can also use Cassandra as a read-intensive database for business intelligence systems. If you're accustomed to SQL, you'll find that the Cassandra Query Language (CQL) is strongly reminiscent of SQL in terms of syntax and keywords. Cassandra is designed for a distributed environment. To fully implement Cassandra's disaster tolerance capabilities on a massive scale, companies need to distribute the data across different regions or even different cloud providers. If one instance fails, some latency may occur, but the data remains available.

CAP Theorem

The CAP theorem is a principle of computer science that helps to explain why NoSQL systems like Cassandra differ from conventional data tools. The CAP theorem (or Brewer's theorem), which describes the relationship between consistency (C), availability (A), and partition tolerance (P), was first articulated by Eric Allen Brewer, Professor Emeritus of Computer Science at University of California, Berkeley and Vice President of Infrastructure at Google. CAP forms the basis for planning a distributed architecture. The basic parts of the CAP decision framework are:

  • Consistency: "Each read operation accesses the last write operation or an error." A consistent system returns the same value from each node that is requested.
  • Availability: "Each request receives an error-free response." Whatever happens within the cluster does not affect the clients. A highly available system always sends an answer, even if half of the cluster is already dead.
  • Partition tolerance: "The system continues to work despite network problems between nodes." A partition-tolerant system continues to run even if there are serious communication problems within the cluster.

The CAP theorem states that no distributed system can fully achieve all three of these objectives. Because a distributed database must continue to operate if the network stops or part of the system is down, the third objective (partition tolerance) is required. That means a distributed database can either be consistent and partition-tolerant (CP) but less available; or it can be highly available and partition-tolerant (AP) and less consistent (Figure 1). These two mutually exclusive options are best understood if you consider the basic trade-offs between consistency and availability. If you wish to maximize availability, the system must continue to receive data when the distributed nodes are not able to communicate with each other (e.g., it can't just stop working and send an error message). But a scenario that calls for a node to provide data to a user when it is unable to verify that the data is up to date does not fulfill the ideal for consistency. On the other hand, if you wish to maximize consistency, the system will need to ensure that the data provided to the user is the latest version or else return an error, which means that, if the nodes cannot communicate, the system would not be fully available.

Figure 1: The CAP theorem states that distributed IT architectures cannot simultaneously prioritize consistency (C), availability (A), and partition tolerance (P).

Cassandra is known as an AP system, because it maximizes availability and partition tolerance at the expense of consistency. Cassandra's developers are willing to tolerate some inconsistency in order to ensure that the database remains available when operating in a partitioned state. This emphasis on availability over consistency is one reason why Cassandra is so highly scalable compared to many conventional database options. The steps necessary to maximize consistency do not scale for multiple nodes and large datasets. However, although Cassandra emphasizes availability in the partitioned state, it does include synchronization features that provide consistency among the distributed nodes in normal operating conditions.

No SPoF

Cassandra's AP design, with its emphasis on availability, requires that the system eliminate all Single Points of Failure (SPoF). (In a relational database, by contrast, each master node is a potential SPoF.) In order to eliminate SPoF, either all components must be designed redundantly or the design must reflect a masterless architecture, in which the nodes are peers. In the case of Cassandra, every node can process a request, no matter if it needs to read or write. If one of these nodes fails, its data must be available at another location – waiting to restore a backup is not an option for a system designed to achieve zero downtime. Instead, the data is provisionally replicated before anyone needs it. Cassandra lets you define a replication factor. If you set the replication to 3 or 5, each data element is replicated in the corresponding number of nodes. Redundant replication causes additional costs; however, the cost of storage is small compared to the loss of reputation and long-term economic damage associated with lost data. It is also important to remember that replicas should not reside next to each other on the servers. Servers in the same rack tend to fail together. Any event can paralyze not only the rack, but the entire data center. It is therefore advisable to opt for georedundant replication.

Automatic Recovery

More servers means the higher the probability of a failure. A cluster must be capable of restoring itself independently. There are two mechanisms for this restoration:

  • Announced redirects: When one node fails, other nodes start keeping updates for the failed node. If the node is regenerated within a reasonable amount of time (typically one to four hours, depending on the configuration), these handover packages are reinstalled on the node, restoring it completely and autonomously.
  • Repair/NodeSync: If network delays or similar issues cause problems, a cluster performs a health check and recovery operation. In Apache Cassandra, this is known as a "Repair."

The nodes constantly communicate with one another in order to implement workaround options when needed. In Cassandra, nodes immediately report when a new node enters the cluster or an old node fails. When a client application connects to the database using the specified IP address, it loads the database metadata and prepares to send a request to each subsequent node if the actual target node is not reached. Depending on the setting, an application may repeatedly request other nodes. Each node in the cluster can receive a request for data. A node receiving the request acts as a "coordinator node" and sends the request to the nodes responsible for data. This system requires knowledge of the cluster metadata, which the nodes constantly exchange. Each node knows the cluster schema and the position of all usable nodes.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Linux News

    Samba 4.0

    • FTC ends Google investigation
    • Samba implements Windows AD
    • News Bites

    Linux Phones

    • Ubuntu launches a new phone OS
    • Samsung announces Tizen phone
    • Perl turns 25

    Big Data DB

    • vert.x project leader
    • Apache Cassandra v1.2 released
    • HPC app contest
  • Apache Announces Cassandra 2.0

    Big Data database rolls out new features and adds new powers to its query language.

  • Hadoop 2 and Apache Spark

    Hadoop version 2 has transitioned from an application to a Big Data platform. Reports of its demise are premature at best.

  • Samba for Clusters

    Samba Version 3.3 and the CTDB lock manager provide full cluster support.

  • Proxmox VE

    The Proxmox Virtual Environment has developed from an insider’s tip to a free VMware ESXi/ vSphere clone. We show you how to get started setting up a PVE high-availability cluster.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News