Ceph and OpenStack join forces

Swift Versus Ceph

If you have worked with OpenStack before, you are probably asking what role Swift, OpenStack's in-house storage service, plays in this scenario. It seems absurd at first glance to integrate an external solution if OpenStack purportedly offers a comparable function – "purportedly," because, in fact, Swift and Ceph have a few big differences.

The similarities, however, are quickly listed: Like Ceph, Swift is also an object store; it stores data in the form of binary files and thus enables horizontal storage scaling. The native Swift interface provides an API that understands both the in-house Swift protocol as well as Amazon's Simple Storage Service (S3). At the Ceph level, admins can achieve similar behavior by additionally deploying the Ceph Gateway [4]. It provides almost the same functions as the Swift proxy.

What is striking, however, are the differences between the two solutions: Ceph does quite different things under the hood from Swift. The complete CRUSH (Controlled Replication Under Scalable Hashing) functionality, which in Ceph handles the task of splitting and distributing data to different drives, is something that OpenStack's native object store totally lacks.

Instead, Swift decides on the basis of a proprietary algorithm exactly where to put an uploaded file. Splitting or simultaneous reading of individual objects from multiple target servers does not take place, so if you read data from a Swift cluster, you always do so from a single disk. In terms of performance, you have to make do with what that disk can give you.

Ceph and Swift also differ in terms of the number of interfaces they offer; besides the two protocols I mentioned earlier, Swift only understands protocols that use the RESTful principle, but no other languages.

In contrast, the Ceph gateway itself supports the RBD (RADOS Block Device) driver, either based on the kernel module rbd, or via the RBD library. Ceph thus provides an interface at the block level and is useful as direct block storage for virtual machines. Swift completely lacks this option, but turns out to be important in the OpenStack context.


Ceph relies on a concept in which all data are always 100 percent consistent. Internally, it uses different features that protect the integrity of the data to achieve this. It also imposes a quorum: If a Ceph cluster fails, only those parts that remain functional reside in the majority partition (i.e., those that know that the majority of the nodes in the cluster are on their side). All other nodes refuse to work.

In the worst case, this approach can mean that a Ceph cluster fails and is unusable because the remaining nodes do not achieve a quorum. The administrator, on the other hand, can always be sure that access to storage is coordinated at all times; in other words, there will never be a split-brain scenario in which undesirable inconsistencies arise.

In contrast, Swift follows an "eventually consistent" approach: In a Swift cluster, 95 of 100 nodes can fail. The five remaining nodes would still allow write access and serve read requests if they have the requested object locally.

If a Swift cluster fell apart, users could write divergently to the nodes of different cluster partitions. At the moment of reunification, Swift would then simply replace the divergent data throughout the cluster with the latest version of a record. Swift clusters thus almost always remain available, but do not guarantee data consistency, which often becomes a problem in enterprise setups.

OpenStack and Ceph Basics

How exactly cloud operators manage the OpenStack and Ceph installation is left up to them; all roads lead to Rome, as they say. From the OpenStack side, the options – besides manual installation – include Packstack [5] or even Kickstack [6]. You can install Ceph quite conveniently with the help of the ceph-deploy tool – and even automatically with Chef and Puppet.

Special care should be taken when planning the cluster for Ceph. As always with storage, it is predominantly a question of performance, and only the performance the cluster is capable of delivering will actually reach the virtual machines later on. Staging a high-performance Ceph cluster is not too complicated, despite all the doomsayers (see the box "Cluster Performance").

Cluster Performance

The easiest trick for increasing the performance of a cluster is to add a high-performance journal to any Ceph OSD (Object-based Storage Device). Each OSD has a journal, anyway, much like a journal for filesystems: It caches the data to be written until it can be written out to the physical disk.

Ceph OSD can swap journals out to SSDs (Figure 1); however, you should not have more than four OSD journals per SSD. If you keep to this limit, you can achieve write rates of 250MBps and more per OSD.

Figure 1: Easy to see: The OSD journal is on a separate partition, which belongs to a solid state disk; this usually leads to significant performance gains.

The network also plays a role – after all, data from the VMs on the virtualization hosts crosses the network to reach the Ceph cluster. If you give the Ceph hosts a private network for the OSD data stream, you can be sure that the replication traffic between the OSDs does not impair the performance of data traffic between the Ceph cluster and the VM hosts. If you want a particularly exemplary setup, you can use 10GB Ethernet for the network between Ceph and the VM hosts, thus avoiding bandwidth bottlenecks.

Once OpenStack and Ceph are professionally installed – and assuming the Ceph cluster has the power it needs under the hood – the rest of the setup only involves integrating the OpenStack Cinder and Glance services with Ceph. Both services come with a native back end for Ceph, which the administrator then enables in the configuration. However, this step also affects security: Inktank recommends creating separate cephX users for Cinder and Glance (Figure 2).

Figure 2: The ceph.conf file contains entries for the keyrings of the Cinder and Glance users. The admin copies them along with the key files themselves to the affected hosts.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More