Process structured text files with Miller
One by One
Miller offers a clever alternative for working with structured text files: use a single tool to replace the strings of commands built from conventional utilities like grep, cut, and sed.
Miller [1] is a helpful command-line tool for working with structured files. Instead of contending with long instructions lined with pipes, you can achieve your goals with more compact constructs.
TIP
If no output appears, Miller is missing the reference to the Newline special character; this problem often occurs in CSV files. If you enter mlr --csv --rs lf
at the beginning of the command, processing should work.
Miller supports a variety of formats (Table 1), which it lists when called with mlr --usage-data-format-examples
. We used version 3.1.2 for this article, freshly compiled from the sources.
Table 1
Data Structures
Type/Format specification | Features |
---|---|
dkvp |
Identifier with value assignments, comma as field separator (Variable=Value,) |
nidx |
Numeric field identifier, comma as field separator (Variable=Value,) |
csv |
Not a field label, text optionally in quotes, comma as field separator (a,b,c) |
pprint |
Formatted output from Miller, produces tables |
xtab |
Outputs tables vertically, one field label with a value in each line |
Miller is a single utility that lets you combine the effects of several classic Unix tools, like grep, cut, join, sort, tail, head, and sed. The syntax of mlr
uses commands with their own options. Table 2 shows a selection of commands for mlr
. See the box called "Some Examples" for examples of mlr
commands.
Some Examples
### csv1.txt first,second,third a,b,c d,e,f ### csv2.txt first,second,third 1,2,3 4,5,6 ### csv3.txt Name,first_name,amount Miller,Hans,12.34 Meier,Klaus,56.78 Bauer,Stefan,90.12 ### csv4.txt Name,first_name,amount Schmidt,Johann,12.34 Meier,Klaus,56.78 Albert,Stefan,90.12 ### dkvp1.txt a=1,b=2,c=3 d=4,e=5,f=6 ### dkvp2.txt a=1,b=2,c=3 d=4,e=5 f=7,g=8,h=9
Table 2
Miller: Command Overview
Command | Options | Function/Notes |
---|---|---|
cat |
|
Like the cat shell command |
|
-n |
Adds another column with ascending enumeration on the left |
|
-N Name |
Like -n, but with a name for the column with the enumeration |
decimate |
|
Uses every tenth line of data |
|
-n N |
Uses every Nth line of data |
cut |
|
Like the cut shell command |
|
-f Name,… |
Only output the fields with this column name |
|
-o |
vor -f: Additionally output the fields in the specified order |
|
-x |
before -f: Do not output the specified fields |
filter |
|
Output data lines with the stated features |
|
'FNR == N' |
Outputs every Nth line |
grep |
|
Like the grep shell command, but with a restricted feature set |
|
-v |
Outputs non-matching lines |
group-by |
|
Outputs identical lines in a group |
group-like |
|
Outputs lines with identical identifiers |
head |
|
Outputs the start of a file |
|
-n Lines |
Number of lines without the header (mandatory) |
join |
|
Join two files via a shared column |
|
-u |
Proceses unsorted input |
|
-j Column,… |
States the shared fields |
|
-f File |
States the file on the left |
rename Alt,New |
|
Rename field designator |
|
-r |
State the old field name as a regular expression |
reorder |
|
Change the column order |
|
-f Columns |
States the order (mandatory) |
|
--e |
Output the stated columns at the end of the line |
sample |
|
Output a number of line in arbitrary position |
|
-k Lines |
States the line count, not including headers |
sort |
|
Sorting |
|
-f Name,… |
Ascending by stated columns, characters of all types |
|
-f Name,… |
Descending by stated columns, characters of all types |
|
-nf Name,… |
Ascending by stated columns, numeric |
|
-nr Name,… |
Descending by stated columns, numeric |
stats1 |
|
Computations |
|
-a sum -f Column,… |
Sum |
|
-a count -f Column,… |
Record/line count |
|
-a mean -f Column,… |
Average |
|
-a min -f Column,… |
Minimum |
|
-a max -f Column,… |
Maximum |
step |
|
Stepwise output of computational results |
|
--a rsum -f Column,… |
Subtotal, output per line |
|
--a delta -f Column,… |
Difference between two subsequent lines |
|
--a ratio -f Column,… |
Relationship between two subsequent lines |
|
--a counter -f Column,… |
Ongoing output of the number of records |
|
--a <from-first -f Column,… |
Difference to first record output |
tac |
|
like tac shell command (output in reverse order) |
tail |
|
Output the end of the file (counterpart to head) |
|
-n Lines |
Number of lines without a header |
top |
|
Output lines/records with the highest or lowest numeric value |
|
-f Column,… |
State the columns with matching numeric values |
|
-a |
Output all columns of a line |
|
--min |
Output the smallest numeric value |
|
-n Lines |
Number of lines to output |
uniq |
|
Output identical records grouped |
|
-g Column,… |
Output the columns to be evaluated |
|
-n |
Only determine the number of records to be output, grouped |
|
-c |
State the number of its occurrences for each grouped record |
bar |
|
Output numeric values as ASCII bar charts |
|
-f Column |
Output the column with the numeric values |
|
-c Character |
State the bar character (default: *) |
|
-x Character |
State the character for the values outside of the display range, (default: #) |
|
-b Character |
State the padding character (default: .) |
|
-w Bar width |
State the bar width, default: 40 |
|
--lo Value |
Initial value bar chart |
|
--lo Value |
Final value bar chart |
To separate the parts of the input, you will usually want to use commas. Miller provides an option for defining the formatting separately for input, the output, or both together. If you want to determine the file format for the input and output separately, use a leading i
for the input and a o
for the output. Table 3 lists some important separator symbols.
Table 3
Separators
Task | Statement | Instructions |
---|---|---|
Set separator |
--rs |
e.g., lf or '\r\n' |
Field separator |
--fs |
e.g., ',' or ';' |
Pair separator |
--ps |
only relevant for DKVP files |
Output
The cat
command reads from text files and outputs them – appropriately formatted if necessary – to a pipe, a file, or the screen. The call in the first line of Listing 1 outputs the two specified files in succession with the column headings (Figure 1). In addition, Miller automatically adds its own numerical identifiers for the fields.
Listing 1
Miller's cat
01 $ mlr cat csv1.txt csv2.txt 02 $ mlr --csv --rs lf cat csv1.txt csv2.txt 03 $ mlr --opprint cat csv1.txt csv2.txt 04 $ mlr --opprint --csv --rs lf cat csv1.txt csv2.txt 05 $ mlr --csv --rs lf --opprint cat csv1.txt csv2.txt 06 $ mlr --icsv --rs lf --odkvp cat csv1.txt > newdkvp.txt 07 $ mlr --idkvp --ocsv --rs lf cat dkvp1.txt > newcsv.txt 08 $ mlr --icsv --rs lf --oxtab cat csv3.txt > newxtab.txt
If you specify the file type (csv
in Listing 1) and a newline (--rs lf
) as the separator for the data, Miller does not enumerate (Listing 1, line 2). It also groups identical column headings into a single heading (Figure 2).
The --opprint
option gives you even clearer output (Listing 1, line 3), but with a minor error. The program inserts its own column headings (Figure 3, first line). Miller lists the headings in the output files like records.
The order of options affects the results (Figure 4). While the option --opprint
is apparently ignored by the call in line 4 of Listing 1, it works correctly in the opposite direction (Listing 1, line 5): The software combines the identical headings and displays the values with a delta to match the header.
Miller Converts
Using cat
, Miller converts the formats listed in Table 1. Put an i
in front of the name of the input and an o
in front of the output, and Miller creates a DKVP format from a CSV file (Listing 1, line 6). The reverse approach works in the same way (Listing 1, line 7).
Converting to a line-by-line display (XTAB format) is useful, for example, when creating non-GUI applications, say, querying addresses (Listing 1, line 8). You will find the processed examples in Figure 5.
Searching and Finding
For browsing structured text files, Miller has the grep
and filter
commands. filter
has a variety of options, particularly with regard to numerical evaluations. The software always outputs the header. The example from the first two lines of Listing 2 shows how to browse csv3.txt
for the name "Meier". With the filter
command, you specify the column; grep
does not need the column. The first method is thus more precise because the term could exist in multiple columns.
Listing 2
Looking for a Name
$ mlr --csv --rs lf filter '($Name == "Meier")' csv3.txt $ mlr --csv --rs lf grep 'Meier' csv3.txt $ mlr --csv --rs lf filter '($amount > 20)' csv3.txt
The example in the last line of Listing 2 shows the results of a numerical analysis. Miller extracts all amounts greater than 20 Euros from cvs3.txt
.
Figure 6 shows the three commands, as well as the resulting output.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.
-
ZorinOS 17.1 Released, Includes Improved Windows App Support
If you need or desire to run Windows applications on Linux, there's one distribution intent on making that easier for you and its new release further improves that feature.
-
Linux Market Share Surpasses 4% for the First Time
Look out Windows and macOS, Linux is on the rise and has even topped ChromeOS to become the fourth most widely used OS around the globe.
-
KDE’s Plasma 6 Officially Available
KDE’s Plasma 6.0 "Megarelease" has happened, and it's brimming with new features, polish, and performance.
-
Latest Version of Tails Unleashed
Tails 6.0 is based on Debian 12 and includes GNOME 43.
-
KDE Announces New Slimbook V with Plenty of Power and KDE’s Plasma 6
If you're a fan of KDE Plasma, you'll be thrilled to hear they've announced a new Slimbook with an AMD CPU and the latest version of KDE Plasma desktop.
-
Monthly Sponsorship Includes Early Access to elementary OS 8
If you want to get a glimpse of what's in the pipeline for elementary OS 8, just set up a monthly sponsorship to help fund its continued existence.
-
DebConf24 to be Held in South Korea
Busan will be the location of the latest DebConf running July 28 through August 4
-
Fedora Unleashes Atomic Desktops
Fedora has combined its solid distribution with rpm-ostree system to make it possible to deliver a new family of Fedora spins, called Fedora Atomic Desktops.
-
Bootloader Vulnerability Affects Nearly All Linux Distributions
The developers of shim have released a version to fix numerous security flaws, including one that could enable remote control execution of malicious code under certain circumstances.