Document conversion from the command line
Command Line – Pandoc

© Lead Image © Dmitry Rukhlenko, 123RF.com
Pandoc lets you convert files from one markup format to another at the command line.
A strength of free software is that applications usually have everything users need for a specific purpose, a tendency that is especially strong in apps for KDE and the command line. Pandoc [1], a universal document converter, exemplifies this strength.
First released in 2006 by John MacFarlane, a philosophy professor at the University of California, Berkeley, Pandoc is a Haskell library for converting between text formats, especially those using a markup format (Table 1). In effect, it is an all-in-one replacement for the dozens of scripts that exist in many distributions for the same purpose.
Pandoc is not equipped to precisely convert complicated layout, such as margins and tables, in formats like PDF or Open Document Form (ODF). However, templates can be created for different formats. Sometimes, though, converting content alone is far better than not at all. Moreover, in many cases, Pandoc is adequate for simple formats, like articles or essays, especially in a markup language. It also has advanced features for slide shows, citations, and bibliographies.
By default, Pandoc produces a document fragment as standard output (Figure 1). The general output type plus the specific format must be specified, as well as the input source:
pandoc -f markdown -t latex pandoc.txt
The result is a fragment for the extension specified that can pasted into another document. To save the output, you must specify a file using the --output
(-o
) option. If you want a complete file, rather than a fragment, add the --standalone
option. As with many command-line options, saving to a file produces no output unless something goes wrong.
If you do not specify the input and output, Pandoc will attempt to guess them. To ensure formatting, a template file can be specified (see the Templates section below). Use the -t
option to list the types of formats supported. If multiple input files are specified, they are concatenated into a single output file with a space between the contents of each input file.
Templates
Each supported format has a default template stored in /usr/share/pandoc/data/templates/
. Most follow the naming structure default.FORMAT
. Exceptions include ODT's template, which is named default.opendocument
, and PDF, which shares the default.latex
template. In addition, EPUB uses epub-page.html
, epub-coverimage.html
, and epub-titlepage.html
. You can view the default template using the command pandoc -D FORMAT
(Figure 2).
You can write or download custom templates [2] or modify copies of existing templates [3] if the default template does not meet your needs. Templates consist of fields with fixed values and may include variables that are replaced by elements of the source file, often automatically. For example, the variable <title>$title$</title>
is replaced automatically by the source file's title. More advanced users can include if/else or conditional statements. For a full description of custom templates, see Pandoc's man page and user guide [4].
In the end, if content is more important than structure, you can generally use the default templates without tweaking them.
Note that early releases of Pandoc required additional applications to convert to PDF. Several online sources like Wikipedia continue to list this requirement, but it is now obsolete.
Input/Output Options
Instead of templates, you can do some formatting using options. To eliminate any ambiguity in the command structure, you can specify the input format with --from FORMAT
(-f FORMAT
) or --read FORMAT
(-r FORMAT
), and the output with --to FORMAT
(-t FORMAT
) or --write FORMAT
(-w FORMAT
). Similarly, although the default directory for all output to a file is .pandoc
, you can specify another directory with --data-dir=DIRECTORY
.
Other options affect the internal formatting. For instance, while the default format is to replace tabs with spaces, --preserve-tab
(-pv
) will override the default. When setting up tabs, you may also use --tab-stop=NUMBER
to change the default four spaces used for tabs. You can also use --base-header-level=NUMBER
to set the first heading level to use and --smart
(-S
) to use typographic characters such as smart quotes and em dashes (instead of two hyphens).
Individual formats also have their own formatting options. For instance, in HTML5, --section-div
adds <div>
or <section>
tags, which can be formatted with CSS stye sheets created outside Pandoc. LaTeX, ConTeXt, and DocBook output can use --chapters
to convert the top-level headings into chapters, while --no-tex-ligatures
suppresses ligatures in LaTeX or ConTeXt output, which can be convenient with some recent OpenType features. More generally, several options are intended primarily for code, such as the self-explanatory --no-wrap
, --columns=NUMBER
, --no-highlight
, and --highlight=STYLE
(with options of pygments
, kate
, monochrome
, espresso
, zenburn
, haddock
, and tango
). Many of these options can reside in a single file that is specified with --defaults = FILE
, eliminating the need to continually structure a detailed command.
For many output formats, options provide most formats with the exception of spacing options. However, layout can be added via CSS style sheets and linked with --css=URL
. Some output formats have specific options for style sheets, such as --reference-odt=FILE
(ODT), --reference-docx=FILE
(DOCX), and --epub-stylesheet=FILE
(EPUB). If you regularly convert to such formats, developing a style sheet may be worth the effort. You may even find a style sheet online that you can use with little or no modification.
Special Uses
Besides routine format conversion, Pandoc has several special uses. For instance, Pandoc supports several slide show applications, including PowerPoint. However, to judge by the available options, its main emphasis is on Beamer, a LaTeX-based presentation application [5]. The markup for a Beamer slide is as simple as starting each one with ##
. To Beamer's own thorough array of features, Pandoc adds options of its own. While converting a file for use in Beamer, Pandoc can define a logo, title graphics, navigation symbols, Beamer theme, and the aspect ratio for slides. Common layouts include slide backgrounds, transitions, and lists in which items are displayed one at a time. There is even an option to add Beamer options to the converted presentation. In addition, Pandoc can convert a Beamer presentation to an article. Pandoc's emphasis on Markdown provides a professional slide show application regardless of the office suite used.
Pandoc also has extensive support for citations and bibliographies. Using the option --citedoc
, Pandoc can generate citations from a source file and a bibliographic database specified with one --bibliography=FILE
for each bibliography used. BibLaTeX (.bib
), BibTeX (.bibtex
), CSL JSON (.json
), and CSL YAML (.yaml
) are all supported formats. By default, Pandoc uses the Chicago Manual of Style citation style, although other citation formats can also be defined. There is even a --citation-abbreviations=FILE
option that can define abbreviations for often used titles. The citations and bibliography are kept separate from the Pandoc files, making it easy to update and then generate a new file.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Find SysAdmin Jobs
News
-
CarbonOS: A New Linux Distro with a Focus on User Experience
CarbonOS is a brand new, built-from-scratch Linux distribution that uses the Gnome desktop and has a special feature that makes it appealing to all types of users.
-
Kubuntu Focus Announces XE Gen 2 Linux Laptop
Another Kubuntu-based laptop has arrived to be your next ultra-portable powerhouse with a Linux heart.
-
MNT Seeks Financial Backing for New Seven-Inch Linux Laptop
MNT Pocket Reform is a tiny laptop that is modular, upgradable, recyclable, reusable, and ships with Debian Linux.
-
Ubuntu Flatpak Remix Adds Flatpak Support Preinstalled
If you're looking for a version of Ubuntu that includes Flatpak support out of the box, there's one clear option.
-
Gnome 44 Release Candidate Now Available
The Gnome 44 release candidate has officially arrived and adds a few changes into the mix.
-
Flathub Vying to Become the Standard Linux App Store
If the Flathub team has any say in the matter, their product will become the default tool for installing Linux apps in 2023.
-
Debian 12 to Ship with KDE Plasma 5.27
The Debian development team has shifted to the latest version of KDE for their testing branch.
-
Planet Computers Launches ARM-based Linux Desktop PCs
The firm that originally released a line of mobile keyboards has taken a different direction and has developed a new line of out-of-the-box mini Linux desktop computers.
-
Ubuntu No Longer Shipping with Flatpak
In a move that probably won’t come as a shock to many, Ubuntu and all of its official spins will no longer ship with Flatpak installed.
-
openSUSE Leap 15.5 Beta Now Available
The final version of the Leap 15 series of openSUSE is available for beta testing and offers only new software versions.