Process structured text files with Miller

One by One

© Lead Image © bahri altay,

© Lead Image © bahri altay,

Article from Issue 187/2016

Miller offers a clever alternative for working with structured text files: use a single tool to replace the strings of commands built from conventional utilities like grep, cut, and sed.

Miller [1] is a helpful command-line tool for working with structured files. Instead of contending with long instructions lined with pipes, you can achieve your goals with more compact constructs.


If no output appears, Miller is missing the reference to the Newline special character; this problem often occurs in CSV files. If you enter mlr --csv --rs lf at the beginning of the command, processing should work.

Miller supports a variety of formats (Table 1), which it lists when called with mlr --usage-data-format-examples. We used version 3.1.2 for this article, freshly compiled from the sources.

Table 1

Data Structures

Type/Format specification



Identifier with value assignments, comma as field separator (Variable=Value,)


Numeric field identifier, comma as field separator (Variable=Value,)


Not a field label, text optionally in quotes, comma as field separator (a,b,c)


Formatted output from Miller, produces tables


Outputs tables vertically, one field label with a value in each line

Miller is a single utility that lets you combine the effects of several classic Unix tools, like grep, cut, join, sort, tail, head, and sed. The syntax of mlr uses commands with their own options. Table 2 shows a selection of commands for mlr. See the box called "Some Examples" for examples of mlr commands.

Some Examples

### csv1.txt
### csv2.txt
### csv3.txt
### csv4.txt
### dkvp1.txt
### dkvp2.txt

Table 2

Miller: Command Overview






Like the cat shell command



Adds another column with ascending enumeration on the left


-N Name

Like -n, but with a name for the column with the enumeration



Uses every tenth line of data


-n N

Uses every Nth line of data



Like the cut shell command


-f Name,…

Only output the fields with this column name



vor -f: Additionally output the fields in the specified order



before -f: Do not output the specified fields



Output data lines with the stated features


'FNR == N'

Outputs every Nth line



Like the grep shell command, but with a restricted feature set



Outputs non-matching lines



Outputs identical lines in a group



Outputs lines with identical identifiers



Outputs the start of a file


-n Lines

Number of lines without the header (mandatory)



Join two files via a shared column



Proceses unsorted input


-j Column,…

States the shared fields


-f File

States the file on the left

rename Alt,New


Rename field designator



State the old field name as a regular expression



Change the column order


-f Columns

States the order (mandatory)



Output the stated columns at the end of the line



Output a number of line in arbitrary position


-k Lines

States the line count, not including headers





-f Name,…

Ascending by stated columns, characters of all types


-f Name,…

Descending by stated columns, characters of all types


-nf Name,…

Ascending by stated columns, numeric


-nr Name,…

Descending by stated columns, numeric





-a sum -f Column,…



-a count -f Column,…

Record/line count


-a mean -f Column,…



-a min -f Column,…



-a max -f Column,…




Stepwise output of computational results


--a rsum -f Column,…

Subtotal, output per line


--a delta -f Column,…

Difference between two subsequent lines


--a ratio -f Column,…

Relationship between two subsequent lines


--a counter -f Column,…

Ongoing output of the number of records


--a <from-first -f Column,…

Difference to first record output



like tac shell command (output in reverse order)



Output the end of the file (counterpart to head)


-n Lines

Number of lines without a header



Output lines/records with the highest or lowest numeric value


-f Column,…

State the columns with matching numeric values



Output all columns of a line



Output the smallest numeric value


-n Lines

Number of lines to output



Output identical records grouped


-g Column,…

Output the columns to be evaluated



Only determine the number of records to be output, grouped



State the number of its occurrences for each grouped record



Output numeric values as ASCII bar charts


-f Column

Output the column with the numeric values


-c Character

State the bar character (default: *)


-x Character

State the character for the values outside of the display range, (default: #)


-b Character

State the padding character (default: .)


-w Bar width

State the bar width, default: 40


--lo Value

Initial value bar chart


--lo Value

Final value bar chart

To separate the parts of the input, you will usually want to use commas. Miller provides an option for defining the formatting separately for input, the output, or both together. If you want to determine the file format for the input and output separately, use a leading i for the input and a o for the output. Table 3 lists some important separator symbols.

Table 3





Set separator


e.g., lf or '\r\n'

Field separator


e.g., ',' or ';'

Pair separator


only relevant for DKVP files


The cat command reads from text files and outputs them – appropriately formatted if necessary – to a pipe, a file, or the screen. The call in the first line of Listing 1 outputs the two specified files in succession with the column headings (Figure 1). In addition, Miller automatically adds its own numerical identifiers for the fields.

Listing 1

Miller's cat

01 $ mlr cat csv1.txt csv2.txt
02 $ mlr --csv --rs lf cat csv1.txt csv2.txt
03 $ mlr --opprint cat csv1.txt csv2.txt
04 $ mlr --opprint --csv --rs lf cat csv1.txt csv2.txt
05 $ mlr --csv --rs lf --opprint cat csv1.txt csv2.txt
06 $ mlr --icsv --rs lf --odkvp cat csv1.txt > newdkvp.txt
07 $ mlr --idkvp --ocsv --rs lf cat dkvp1.txt > newcsv.txt
08 $ mlr --icsv --rs lf --oxtab cat csv3.txt > newxtab.txt
Figure 1: The Miller cat command prints the contents of a file if no options are specified.

If you specify the file type (csv in Listing 1) and a newline (--rs lf) as the separator for the data, Miller does not enumerate (Listing 1, line 2). It also groups identical column headings into a single heading (Figure 2).

Figure 2: Specify the file type and the separator for the data as parameters to improve the output.

The --opprint option gives you even clearer output (Listing 1, line 3), but with a minor error. The program inserts its own column headings (Figure 3, first line). Miller lists the headings in the output files like records.

Figure 3: Visually enhanced output with column headings.

The order of options affects the results (Figure 4). While the option --opprint is apparently ignored by the call in line 4 of Listing 1, it works correctly in the opposite direction (Listing 1, line 5): The software combines the identical headings and displays the values with a delta to match the header.

Figure 4: The order in which you specify options has an impact on the result.

Miller Converts

Using cat, Miller converts the formats listed in Table 1. Put an i in front of the name of the input and an o in front of the output, and Miller creates a DKVP format from a CSV file (Listing 1, line 6). The reverse approach works in the same way (Listing 1, line 7).

Converting to a line-by-line display (XTAB format) is useful, for example, when creating non-GUI applications, say, querying addresses (Listing 1, line 8). You will find the processed examples in Figure 5.

Figure 5: Miller easily converts data structures from one format to another.

Searching and Finding

For browsing structured text files, Miller has the grep and filter commands. filter has a variety of options, particularly with regard to numerical evaluations. The software always outputs the header. The example from the first two lines of Listing 2 shows how to browse csv3.txt for the name "Meier". With the filter command, you specify the column; grep does not need the column. The first method is thus more precise because the term could exist in multiple columns.

Listing 2

Looking for a Name

$ mlr --csv --rs lf filter '($Name == "Meier")'  csv3.txt
$ mlr --csv --rs lf grep 'Meier' csv3.txt
$ mlr --csv --rs lf filter '($amount > 20)'  csv3.txt

The example in the last line of Listing 2 shows the results of a numerical analysis. Miller extracts all amounts greater than 20 Euros from cvs3.txt.

Figure 6 shows the three commands, as well as the resulting output.

Figure 6: The Miller commands filters and grep make it easy to extract specific data.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Statistics with gawk

    With very little overhead, you can access statistics on the spread of COVID-19 using gawk scripts and simple shell commands.

  • Zenity Dialogs

    The Zenity command-line utility lets you create simple dialog boxes with your own data or with the output of utilities and applications.

  • Command Line: sort

    sort helps you organize file lists and program

    output. And if you like, you can even use this small

    but powerful tool to merge and sort multiple files.

  • Tool Tips

    We test Yuck, Uftpd, Guncat, Kiwix, Miller, and Debian Package Search.

  • Pathfinder

    When Mike Schilli is faced with the task of choosing a hiking tour from his collection of city trails, he turns to a DIY program trained to make useful suggestions.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More