Improving Linux package management

Delivery Service

© Photo by Mika Baumeister on Unsplash

© Photo by Mika Baumeister on Unsplash

Author(s):

Linux package managers work too slowly. The experimental distri research project investigates ways to speed up package management.

Package managers differ from each other not only in terms of the package formats they use, but also in their execution speed. Developer Michael Stapelberg has been working on how to streamline package managers such as Debian's Apt or Fedora's DNF to make them faster. He has written blog posts on the subject, given talks, and created an experimental distribution, distri [1] to explore the problem.

Distri is a minimal, command-line distribution for reviewing package management concepts in Linux. This is purely a feasibility study and is not suitable for production use. Distri seeks to be the simplest distribution that is still useful.

Criticism of Debian

Stapelberg, currently a Google developer, was a package maintainer at Debian from 2012 to 2019. Besides maintaining packages of Debian he wrote the i3 Window Manager and the Debian Code Search engine.

In March 2019, he announced in frustration his withdrawal from Debian development with a harsh criticism of the Debian project [2]. He referred to the practices and tools used to develop, manage, and support the software in the distribution saying that they were often more of a hindrance than a help.

In Stapelberg's opinion, there is a lack of effective tools to implement comprehensive changes in a timely manner. Also, according to Stapelberg, the specifications laid down in the Debian guidelines and pushed by the quality assurance tool Lintian [3] unduly hinder the implementation of necessary technical changes.

Too Much, Too Slow

In particular, Stapelberg vehemently criticizes the package management – not only in Debian, but Linux in general. Above all, he dislikes that package managers do too much and do it too slowly. To this end, he first conducted a series of tests with small and larger packages using the Apt, DNF, pacman, Nix, and apk package managers, contrasting the metadata downloaded and the time and bandwidth used [4].

What he discovered was that the metadata downloaded was often out of proportion to the package's size, and even more so the smaller the package. Any Debian user can easily check this by looking in the terminal at the end of an apt update to see how many megabytes of metadata the operation downloads. This amount often exceeds the total size of the actual packages to be updated (Figure 1).

Figure 1: When installing and updating packages, the package manager also downloads metadata. The volume of metadata is particularly high in Fedora and Debian, which delays package installation.

The maintainer scripts that the package manager runs during installation also slow down the process, as well as prevent parallel installation of packages. In Stapelberg's view, these scripts can just as easily run the first time the application is launched. Debian processes maintainer scripts using files such as preinst and postinst when installing or updating packages; they contain distribution-specific customizations [5]. Debian has 8,620 maintainer scripts. The Debian Policy Manual web page provides flowcharts that exemplify the underlying complexity here [6]. In Fedora, scriptlets perform this task [7].

Inefficient

At Google, Stapelberg learned a great deal about effectively updating large amounts of data. Updates to distributions proved fairly ineffective in his research. He did a test using common systems to install one small package and one large package, recording times.

Although the test computer's network connection supported speeds of around 115MBps, none of the distributions achieved more than just under 11MBps of throughput, with most achieving around 3MBps. Alpine Linux performed best, being the fastest in both tests at 10.8MBps and also requiring the lowest volume of metadata to complete the task. Arch Linux and NixOS were in the middle, while Debian and Fedora performed worst.

As an example, for the small 75KB ack package, Fedora had to download a massive 114MB and the install took 33 seconds. Alpine, on the other hand, was content with 10MB installed in one second. When installing the virtualization software Qemu, Alpine managed with 26MB, while Fedora needed almost 10 times the amount of data at 226MB.

In view of such numbers, it is little wonder that Fedora is pretty sluggish when it comes to updates and installations. In both scenarios, Debian came in second to last, because downloading the large volume of metadata also delayed the process. Besides Alpine, Arch Linux also had one of the faster package managers in the test.

Hooks and Triggers

Fedora is also slow because its unpacked package list alone is 60MB, while Alpine's list is a lean 734KB. Fedora offers over 20,000 packages, which is three times more packages than the very small Alpine, but the difference is still striking.

However, Stapelberg thinks even the best results in the test are too slow. He sees another reason for this in the frequently used hooks and triggers that the package manager executes during installation, which trigger the aforementioned maintainer scripts, create daemon user accounts (such as an FTP or WWW account), or create cache files.

One of the most commonly used triggers, the man package trigger ensures that a man page for the package is included on the system with each package installation. In his blog, Stapelberg explains why all this should not happen during installation and how it prevents parallel installation of packages [8].

In Stapelberg's opinion, these interruptions of the actual installation should preferably take place when the app is first launched. If an application does not start between installation and the first or even further updates, the adjustments would only be executed once instead of several times, for example.

Image Instead of Archive

In Stapelberg's opinion, a package manager should only do what is absolutely necessary to anchor a package in the system so that it is ready for use (i.e., start the program or load a kernel module). Unpacking during installation is not necessary if packages are available as filesystem images that the distribution mounts at startup, as is the case with AppImage or Snap.

According to Stapelberg, no package manager in a Linux distribution currently uses this scenario, although it could still increase the speed to above the level achieved by Alpine's apk, the fastest package manager in his test series. Images are currently only used by the Haiku operating system project.

In Stapelberg's experimental distribution distri, he seeks to experiment with reducing the complexity of package management. He concludes that distributions like Fedora or Debian could also run faster given less complexity. That doesn't mean it's technically easy to implement, but it would be feasible.

For example, distri uses read-only SquashFS images as the package format instead of the usual TAR archives (Figure 2). In addition to increased speed, this has the advantage that applications cannot be modified, which protects them from accidental or malicious modifications.

Figure 2: Distri uses the SquashFS package format, available as images and in packaged formats.

Distri organizes all files provided by a package under the /ro/ mount point, each in its own directory. The usual data exchange between software packages, which takes place via the specified directories of the Filesystem Hierarchy Standard (FHS) in conventional distributions, is handled by the system via exchange directories, which are provided by FUSE.

For example, the exchange directory /ro/share/ provides the union of the share/ subdirectory of all packages in the package store. The global exchange directories map the FHS with sufficient accuracy to allow third-party software, such as Google Chrome or Spotify, to work. Using /ro/ also prevents conflicts when installing multiple versions of a package.

Distri also streamlines package building. Unlike conventional distributions' builders, the distri package builder does not install packages in the build environment. Instead, the system provides a filtered view of the package store in /ro/ in the build environment. Even with large dependency trees, setting up a build environment this way takes a fraction of a second.

Distri's website provides information about the various ways to use the distribution [9]. There is no installer yet, but the maintainer has future plans for one. Distri can be started from a USB stick or in a Docker or LXD container, as well as in a virtual machine with VirtualBox or Qemu. Since it is in IMG format [10] and not an ISO, you first need to convert it to a Virtual Disk Image (VDI) for VirtualBox (Figure 3).

Figure 3: If using VirtualBox, you first need to convert the downloaded image into a VDI.

First, you unpack the image. Since the developers have packaged it with the relatively new Zstandard (Zstd) compression algorithm, you will probably need to install zstd up front. On Debian, you can do this with:

apt install zstd

On Fedora, use:

dnf install zstd

Then extract the file using the call:

$ unzstd ~/Downloads/distri-disk.img.zst

In the process, the compressed file loses the .zst attachment and grows to 8GB. An attempt to start distri on an 8GB USB stick failed as expected; you need a stick with at least 16GB capacity. After starting distri, a login prompt opens where you can enter the password, which is peace for root (Figure 4).

Figure 4: At login, distri mounts a basic set of essential applications. At runtime, it brings in more apps, depending on the usage.

You are greeted by a very simple Z shell prompt that lets you explore the system. At first, I thought the cd command had failed, because the prompt does not show the new location after the change – but pwd will help here. Entering cd /ro/ takes you to the directory with all installed packages; switching to /ro/share/ takes you to the exchange directory (Figure 5).

Figure 5: Besides the usual suspects like /etc or /usr, you will also find distri-specific directories like /ro and /roimg in the root directory.

For common distri commands, and their equivalents in Debian, see distri's documentation [11]. This is also where you can learn more about Distri's package format and how to create your own packages. The distri update command (Figure 6) replaces

apt update && apt full-upgrade
Figure 6: The distri update command updates the system. In a test, this took just under 3.3 seconds.

and upgrades the entire distribution at an impressive pace. With a 1Gbps connection, the system downloads and installs around 275MB of data in less than four seconds.

Package installations are also completed in the blink of an eye (Figure 7). In the case of the Nano editor

distri install nano-amd64-4.9.5-2
Figure 7: The installation of individual packages is blazingly fast. The cmake package, weighing in at about 75MB, was installed in about 1.5 seconds on a fast Internet connection.

it took just over a millisecond. You need to specify the package version because multiple versions can be installed in parallel. Available packages can be found in the distri repository [12].

Conclusions

Because distri is an experiment in package management, it is likely to interest only a limited user base. If you want to dig deeper, you should read all of Stapelberg's blog posts on the topic [13] and watch his keynote at the Arch Developer Conference 2020 [14]. While there is no official support for distri, Stapelberg will answer questions on the mailing list [15] and in the #distri chat room on the legacy.irc-robustirc IRC server. However, you may have to be patient.