File Compression for Modern Computing

Command Line – zstd

© Lead Image © modella, 123RF.com

© Lead Image © modella, 123RF.com

Article from Issue 215/2018
Author(s):

In an effort to meet modern computing needs, zstd offers a greater degree of compression at a faster compression rate, with unique options to enhance performance.

Many standard Linux tools have been around so long that second-generation tools are being developed to meet modern needs. For instance, Neovim is an update of the Vim text editor, and apt is a rearrangement of the basic tools for apt-get, the Debian package manager. Similarly, Zstandard (zstd) [1] is a revision of compression tools like tar and gzip, except with higher degrees of compression at a faster rate. Additionally, zstd includes several unique tools for enhanced performance, such as advanced compression features, compression levels and strategies, and dictionaries.

zstd was written by Facebook employee Yann Collet and released in August 2016. Briefly, it is a lossless compression algorithm based loosely on the earlier LZ77 algorithm [2]. The command's syntax is deliberately similar to that of gzip, down to variations on the basic command that are the equivalent of popular options. For example, zstdmt is the same as zstd -T0 (use the same number of threads as detected cores), whereas unzstd is the same as zstd -d (decompress), and zstdcat is the same as zstd -dcf (decompress, force write to standard output, and overwrite without prompt).

The Basics

Getting started with zstd is as simple as typing:

zstd FILE

Multiple files can be specified using a space-separated list. Unless you add --rm as an option, the original file is not deleted. A progress bar is displayed as a single file is compressed; unless -q is added to the command, an error produces a short help page. Unless otherwise specified, level 3 compression is used along with four threads (see below), and a data-integrity check is done on the original file before compression. The result is a file with the same name as the original file, but ending in .zst (Figure 1).

Figure 1: The basic zstd command.

To decompress, type:

zstd -d FILE

and a decompressed file is created without the .zst extension. If you specify more than one .zst file to uncompress, all the files are decompressed into a single file. Another option is to run --test (-t) to check the integrity of compressed files without creating or deleting any files.

In any operation, you can specify file size as needed in kilobytes (using KiB, Ki, K, or KB) or in megabytes (using MiB, Mi, M, or MB).

Basic Options

Most of zstd's basic behaviors can be modified by options. To start, zstd has both verbose (-v) and quiet (-q) modes for running the command. You can also use -o FILE to specify any file name you want for an output file, placing the option after the original file's name, instead of directly after the basic command with the rest of the options. Additionally, if you are aware that a compressed file of the same name as the output file already exists, you can add --force (-f) to overwrite any file of the same name without confirming the operation first.

Several options help speed up commands. You can save time by turning off the integrity check during compression with --no-check. The increased speed, of course, comes with the possibility that the compressed file might not be usable. A somewhat safer option to increase the speed is to enable --sparse, which reduces the number of zeroes in the output file, which can add a couple more percentage points of compression when dealing with a text file. For a graphics file, however, --sparse saves so little that it hardly seems worth using unless you are determined to save every bit of hard drive space possible.

As a recently created utility, zstd can also be compiled to use multiple CPU cores to make compression faster. By default, only one core is used, but you can adjust the number with the option -T=NUMBER (--threads=NUMBER).

If the value is  , then zstd will detect the number of cores and try to use all of them. Should the online help appear, you will know that the zstd version you are using was compiled without threading.

Compression Levels, Strategies, and Advanced Options

zstd approaches compression in two different ways. The more conventional tactic is to specify a specific compression level using --compress (-z or -#LEVEL). The default level 3 can be overridden with any number from 1 to 19, with 1 being the quickest and least compressed, and 19 the slowest and most compressed. To give a sense of the choices involved, a compression setting of 1 reduced the size of a 42MB .png file by five percent in about six seconds, whereas a setting of 19 compresses the same file by just under eight percent in about 20 seconds. With a plain text file of 4,600 bytes, level 1 compression produces an archive file 55 percent smaller, wereas level 19 compression creates a file that is 59 percent smaller, both requiring only a few seconds. This difference between graphic and text files is typical.

You also have the option of adding the --ultra option to enable the high, more memory intensive compression of levels 20-22. However, when used by themselves, the advanced compression levels are no more efficient than level 19 compression. To get the most from the ultra-compression levels, you need to experiment with the advanced options.

The advanced options for compression are defined in the option:

--zstd=OPTION=SETTING,OPTION=SETTING

The easiest to use is strategy= (strat=). This option can be completed with a number from   to 7, in which   is the fastest and 7 the most compressed. Each strategy contains a number of methods and searches the file being compressed for an opportunity to use them. This search greatly increases both the time and the memory required to compress the file. However, the use of 7 can double the compression for a file.

Other advanced options for compression can override any of the options used in zstd's compression algorithm. For instance, hashLog=BITS sets the maximum number of bits for a hash table, making compression faster. Unfortunately, the man page lists the algorithm options with only a brief explanation of what they do, so most users will have to experiment blindly or else find other sources of information to understand what is being adjusted. Any algorithm option not specifically altered will use its default settings.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • BorgBackup

    In Star Trek, the Borg adds individuals to its collective consciousness, an apt metaphor for any backup application that stores individual files in an archive. BorgBackup creates folder repositories for multiple archives, making it an especially befitting description of assimilation.

  • Command Line: gzip, bzip2, tar

    A short command is all it takes to pack your data or extract it from an archive.

  • FLAC: The premier digital audio codec

    With a little effort, you can create digital audio files with CD quality sound.

  • Resettable Lab Accounts

    Let your lab users play, and restore the original settings for the next login.

  • Command Line: Archives

    Gzip and bzip2 not only compress files, they also provide lean and powerful tools for viewing, searching, and comparing text files.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News