Linux containers with systemd-nspawn and rkt
Container Time
The systemd project has given rise to lots of other interesting tools and technologies. Meet systemd-nspawn, a container tool that serves as a simple Docker alternative.
Systemd-nspawn [1]is a lightweight container tool that can run a command or full operating system in a contained environment on Linux. According to the systemd-nspawn man page, systemd-nspawn is "…similar to chroot(1) but more powerful since it fully virtualizes the filesystem hierarchy as well as the process tree, the various IPC subsystems, and the host and domain names." (See also the "Other Container Tools" box.)
Other Container Tools
To understand systemd-nspawn, it can be helpful to contrast it with a few different but related tools.
Chroot [3] is one of the oldest and simplest ways to provide some process isolation on Linux. The chroot system call allows the calling process to switch to an isolated filesystem environment. After that, any filesystem path reference that the application makes is considered relative to the chroot directory. An example of this behavior is:
chroot /home/editorial/images/jessie/ /bin/ls
The second part of the line attempts to run the ls
command in the chroot environment set up by the first part (Figure 1). The new root directory thus resides on the host below /home/editorial/images/jessie
. After using the chroot
command, the process on the host does not see any files outside of /home/editorial/images/jessie
.
Fundamentally, all chroot does is change the mechanism that's used to resolve pathnames when a process tries to access the filesystem. Chroot thus provides a basic level of isolation at the filesystem level. Unfortunately, the simple isolation provided by chroot is quite trivially breakable: Various methods exist for "escaping" a chroot jail (e.g., if a process is already holding onto a file descriptor pointing outside of the chroot before the call is made), so chroot alone does not provide sufficient security. Chroot also doesn't offer any of the other types of process isolation that can be desirable on Linux, like memory usage or network interfaces.
The past five years have seen the emergence of more powerful containment tools, like systemd-nspawn, rkt [4], and Docker [5], that take advantage of Linux kernel features to provide much greater isolation between processes on a system.
Rkt and Docker are both targeted at end users and admins wanting to run applications in containers. Systemd-nspawn is a lower-level tool, targeted more at developers and testers.
Rkt is an application container runtime developed at CoreOS, and it is an implementation of the App Container Specification (appc) [6]. When running application containers, rkt internally uses a staged architecture. The first stage, stage0, is the rkt command line itself, which is responsible for things like discovering application container images on the Internet or from repositories, downloading them across the network, and managing a local disk cache. Stage1 is responsible for setting up the actual isolated environment, using the necessary kernel features to isolate the applications from the host. Finally, stage2 refers to the user-specified applications themselves; in the case of rkt, multiple applications can run in a single pod.
The Rkt version delivered with Core OS directly leverages systemd-nspawn to do all of the heavy lifting when it comes to setting up the container. Another version of rkt uses the kvm tool [7], which sets up a lightweight Virtual Machine (VM) that takes advantage of the hardware isolation provided with the Linux kernel's KVM driver.
Docker is a container platform that consists of a lot of parts, with duties ranging from executing individual containers in a host, to scheduling and orchestrating containers across large clusters of servers. For the purposes of this comparison, Docker consists of two key modes encapsulated in the docker
command-line tool:
- daemon mode, which performs all of the heavy lifting involved in running and managing containers
- client mode, which is how most users interact with the Docker engine [8]. For example, a simple
docker run
command is translated into an API call that is passed on to the local Docker engine, which is then responsible for setting and running the container that the user specified.
The Docker engine is responsible for a huge number of different functions: from retrieving container images over the Internet, to managing the lifecycle of containers on a system, to serving the aforementioned REST HTTP API (whether to the actual "docker" client, or any other HTTP client). The Docker engine is thus necessarily long-running (because it directly manages the lifecycle of all "Docker containers" on a system).
The systemd-nspawn container tool began as a means for systemd developers [2] to test building and running systemd itself without affecting the host operating system. Systemd-nspawn lets you launch an application in an isolated container with a single command, making it quite handy for developers who want to run buggy pre-release code without risking damage to the system.
Since the first release, systemd-nspawn has evolved to include a swath of functionality, ranging from advanced networking configurations to SELinux integration and native overlay filesystem support. Modern systemd-nspawn is a versatile and full-featured tool you can use for a variety of different Linux use cases, but its primary purpose is to serve as a tool for developing and testing.
Namespaces and Cgroups
Internally, systemd-nspawn uses several features of the Linux kernel to provide process and resource isolation. The first and foremost of these features is namespaces [9].
Linux namespaces isolate various system resources in a way that is abstracted from processes. For example, if a process is in its own unique PID (process ID) namespace, it will not see any other processes on the system that aren't in that same namespace. In this way, users can restrict processes from interacting with each other along various different axes. The Linux kernel provides a number of different namespaces (Table 1).
Table 1
Kernel Namespaces
Namespace | Function |
---|---|
IPC |
System V IPC, POSIX message queues |
Network |
Network devices, stacks, ports, etc. |
Mount |
Mount points |
PID |
Process IDs |
User |
User and group IDs |
UTS |
Hostname and NIS domain name |
A process generates a namespace by issuing the system call unshare()
. This call detaches the calling process from its existing namespace and creates a new namespace. A process can also use the setns()
system call to change to an existing namespace on the system.
Systemd-nspawn's extensive use of namespaces is reflected in its name. "Nspawn" refers to the fact that the tool generates new namespaces. By default systemd-nspawn will run processes in their own IPC, mount, PID, and UTS namespaces. You can also give the container an independent network namespace and a flag to enable rudimentary user namespace support. For more information on namespaces, refer to the excellent series of introductory articles on LWN [10].
Another key container technology for Linux is cgroups [11]. (When people use the term "Linux containers," they're typically referring to a combination of cgroups and namespaces.) The name cgroups is an abbreviation for "control groups." Cgroups are a means for organizing processes on a Linux system into a hierarchical tree, and then optionally applying different resource parameters to sections of the hierarchy. For example, you can use cgroups to apply memory limits to a particular process or group of processes, and these limits are then enforced by the kernel.
Now, systemd-nspawn itself doesn't do a whole lot with cgroups; it just makes sure the cgroup tree is available within the mount namespace it sets up.
Getting Started with systemd-nspawn
Systemd-nspawn is provided out of the box on any modern Linux distribution that uses systemd as its init system (which these days is almost all of them). In its most basic invocation, you can point systemd-nspawn at a directory and tell it to execute a binary in that directory, but systemd-nspawn also provides over 30 command-line flags to customize different aspects of the containers it creates.
Recent versions of systemd-nspawn also introduced a configuration file, which you can use to encode most of the settings that are available through the flags in a reusable format.
The simple example in Listing 1 shows systemd-nspawn in action. The example downloads an image of the Debian Jessie [12] distribution and then launches it in a container (Figure 2). You need to run these steps with root privileges.
Listing 1
Retrieving and Starting Jessie
Additionally, you need to delete the root password in the /home/redaktion/jessie/etc/passwd
file to use Jessie. The process looks very similar to this with rkt by the way (Listing 2).
Listing 2
Rkt in Action
The commands shown in Listing 2 download an ACI of Etcd version 2.0.0 and launch it (Figure 3). In this scenario, Rkt has set up the required file system in the directory – including a copy of systemd, which it calls using systemd-nspawn [...]
.
Conclusion
Systemd-nspawn is very much production ready. Many Linux users – on CoreOS and other platforms – are actively using both rkt and systemd-nspawn directly in production and seeing great success.
Having said that, the systemd developers are still careful about how they position systemd-nspawn. For example, the manpage states that systemd-nspawn is not suitable for secure container setups and explains that the intended use is more for debugging and testing.
Although systemd-nspawn is quite fully featured, it still needs some work. One of the areas that could use some improvement is user namespaces [13], which are not very usable in their current form.
With mature and configurable tools like rkt, Docker, and systemd-nspawn, developers and systems administrators have plenty of options for running application containers.
All of the projects described in this article are completely open source and have active, vibrant communities. Anyone interested in helping to define and implement the future of containers on Linux is encouraged to get involved!
Infos
- systemd-nspawn: http://www.freedesktop.org/software/systemd/man/systemd-nspawn.html
- systemd: https://wiki.freedesktop.org/www/Software/systemd/
- Chroot: https://www.gnu.org/software/coreutils/coreutils.html
- Rkt: https://github.com/coreos/rkt
- Docker: http://www.docker.com
- Appc: https://github.com/appc
- Kvmtool: https://kernel.googlesource.com/pub/scm/linux/kernel/git/will/kvmtool/+/master/README
- Docker Engine: https://www.docker.com/docker-engine
- Namespaces overview: http://man7.org/linux/man-pages/man7/namespaces.7.html
- Namespace series on Lwn.net: https://lwn.net/Articles/531114/
- Cgroups: https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
- Debian Jessie: https://www.debian.org/releases/stable/
- User namespaces: http://man7.org/linux/man-pages/man7/user_namespaces.7.html
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.
-
New Steam Client Ups the Ante for Linux
The latest release from Steam has some pretty cool tricks up its sleeve.
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.
-
Fedora 41 Released with New Features
If you're a Fedora fan or just looking for a Linux distribution to help you migrate from Windows, Fedora 41 might be just the ticket.
-
AlmaLinux OS Kitten 10 Gives Power Users a Sneak Preview
If you're looking to kick the tires of AlmaLinux's upstream version, the developers have a purrfect solution.
-
Gnome 47.1 Released with a Few Fixes
The latest release of the Gnome desktop is all about fixing a few nagging issues and not about bringing new features into the mix.