Run Samba in clustered mode with Ceph
Double Sure
Fail-safe is a massive topic for file server admins. Thanks to the CTDB and Ceph, you can put Samba in a cluster with minimal complications.
The popularity of Samba means file server admins have to think about how they can protect the service against loss. Samba is now mature and runs without any problems in most cases, but if the server on which Samba is running crashes, the service is no longer available.
The Samba developers are aware of the need for some fault tolerance and have responded to the problem with a genuine cluster option. Samba's cluster mode means you can use several Samba servers to process incoming requests. A single Samba server crash will not stop the show because other servers in the cluster will keep working.
Configuring Samba's cluster mode is not entirely intuitive, especially considering that the Samba cluster implementation has changed radically several times in the past few years. This article offers a quick look at high availability with Samba.
The Challenge
Why is a Samba cluster such a challenge? A little excursion into the world of storage theory will offer some answers. In particular, the issue of locking is very important. How does the application handle concurrent access to the same file? "Application," in this case, can mean a simple filesystem on a disk or a complex application. In any case, just imagine the chaos if two clients simultaneously access the same file and change parts of it. The file would end up corrupted, and neither client A nor client B could do anything with the contents.
Various filesystems have tried practically every conceivable solution for file locking: Older filesystems rigorously deny access to a file if it is already open. Modern filesystems follow the principle that the last write wins and determines the contents of the file.
Because Samba offers a network filesystem, it also has internal locking functions. Samba uses the TDB (Trivial Database) database format for storing internal metadata. One of the most important databases is locking.tdb
, which tracks which client is currently accessing which file.
Samba relies on opportunistic locking, which means a client tells the server that it has claimed exclusive access rights to a file on the Samba share for itself. Once the Samba server has complied with the request, it writes a corresponding note to locking.tdb
and stops other clients from accessing the same file.
As long as the process is limited to a single instance of Samba, everything works fine: The single Samba server can reliably assume that its version of locking.tdb
is authoritative.
But a clustered configuration adds a challenge: Multiple Samba instances need to sync the contents of their locking.tdb
files with each other. The cluster must therefore have some means for managing client access to files on the Samba volume.
The solution for this problem, say the Samba developers, is CTDB (Clustered Trivial Database), an extension of TDB that lets many instances of Samba dynamically share TDB content.
Requirements for Clustered Samba
A few years ago, the option for a cluster file server was some form of clustered filesystem: solutions such as GFS or OCFS2 (Oracle Cluster Filesystem 2) could manage cluster-wide access to the same filesystem in a NAS share connected via iSCSI. But solutions of this sort required a cluster manager, preferably Pacemaker, and configuring and managing Pacemaker can be a very complicated task – especially when you are using it with GFS or OCFS2.
Luckily, distributed storage solutions have led to a simpler approach. Distributed storage tools such as GlusterFS and Ceph work differently: A large filesystem comprises many small segments on the participating servers, and consistency issues are addressed internally. Access occurs through designated, independent mechanisms via simple interfaces. In truth, distributed storage is no less complex than Pacemaker with OCFS2, but it does a better job of hiding the complexity. The barrier to entry is thus lower.
Two rival distributed storage solutions dominate the market, and both are sponsored by Red Hat: On one hand, GlusterFS offers a classic distributed filesystem; on the other, Ceph is an object store that can offer its contents in the form of a POSIX-compatible filesystem, CephFS. CephFS was stuck for several years at the beta stage, but the last version of Ceph "Jewel" promises a higher level of maturity: CephFS is suitable for the production operation, according to the developers.
Three servers are available in the following example of Ceph: Alice, Bob, and Charlie – each of these servers has a hard drive that it contributes for the Ceph object store. Although the performance benefits of Ceph are best realized when the cluster runs on real hardware, you can easily emulate this configuration on virtual machines if you only want to try things out.
Even the most attractive Samba cluster will be no help if you ignore fundamental rules of high availability (HA). Basically, an HA cluster with Samba faces the same challenges that all other services on a server need to take on: Clustering at the software level only checks one box on the list. The loss of infrastructure that is not controlled by Samba can still trip Samba up.
Network and power are the two classic infrastructure issues you'll need to address: Several Samba servers in the combined cluster are good, but if they are all connected to the same electrical circuit and the circuit fails, both servers are dead. The problem is the same for Ethernet: If all nodes in the cluster are connected to the same switch and it fails, the Samba service is still available, but its clients can no longer reach it.
Creating the Necessary Infrastructure
The degree of redundancy depends on the budget for the project. Redundancy at the power and network levels can cause significant additional costs, because you'll need to duplicate many components. Admins face a compromise: The more parts you make redundant, the lower the risk of failure, but the setup is more expensive.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.
-
Juno Tab 3 Launches with Ubuntu 24.04
Anyone looking for a full-blown Linux tablet need look no further. Juno has released the Tab 3.
-
New KDE Slimbook Plasma Available for Preorder
Powered by an AMD Ryzen CPU, the latest KDE Slimbook laptop is powerful enough for local AI tasks.