How to use pandoc to convert files on the Linux command line

A terminal window running on a Linux laptop with an Ubuntu-style desktop theme.
Fatmawati Achmad Zaenuri / Shutterstock

You can use pandoc on Linux to convert between more than 40 file formats. You can also use it to create a simple document system as code by writing in Markdown, storing in gitand publish in any of its supported formats.

Converting documents and documents as code

If you have a document in any of pandoc's many supported file formats, turning it into any of the others is a piece of cake. It is a useful tool to have!

But the true power of pandoc it becomes apparent when you use it as the basis for a simple document-as-code system. The premise of docs-as-code is to take some of the techniques and principles of software development and apply them to writing documentation, especially for software development projects. However, you can apply it to the development of any type of documentation.

Software developers use their favorite editor or integrated development environment (IDE) to write your programs. The code they write is saved in text files. These contain the source code for the program.

They use a version control system, or VCS (Git is the most popular), to capture changes to the source code as it is developed and improved. This means that the programmer has a complete history of all versions of the source code files. You can quickly access any previous version of a file. Git stores files in a repository. There is a local repository on each developer’s computer and a central, shared and remote repository that is often hosted in the cloud.

When they are ready to produce a working version of the program, they use a compiler to read the source code and generate a binary executable.

By writing your documents in a lightweight text-based markup language, you can use a VCS to control the version of your writing. When you’re ready to distribute or publish a document, you can use pandoc to generate as many different versions of your documentation as you need, including web-based ones (HTML), word processing or typographic (LibreOffice, Microsoft Word, Texas), Portable Document Format (Pdf), ebook (ePub), and so.

You can do all of this from a set of lightweight version-controlled text files.

Pandoc installation

Install pandoc on Ubuntu, use this command:

sudo apt-get install pandoc

In Fedora, the command you need is the following:

sudo dnf install pandoc

In Manjaro, you must type:

sudo pacman -Syu pandoc

You can check which version you have installed using the --version option:

pandoc --version

Use pandoc without files

If you use pandoc without any command line option, it also accepts written input. Just press Ctrl + D to indicate that you have finished typing. pandoc expects you to write in Markdown format and outputs HTML output.

Let’s see an example:

pandoc

We’ve written a few lines from Markdown and we’re about to hit Ctrl + D.

As soon as we do pandoc generates the equivalent HTML output.

To do something useful with pandocHowever, we really need to use.

Markdown basics

Markdown is a lightweight markup language and certain characters are given special meaning. You can use a plain text editor to create a Markdown file.

Markdown is easily readable, as there are no visually cumbersome labels to distract from the text. The format in Markdown documents resembles the format it represents. Here are some of the basics:

  • To emphasize the text with italics, wrap it in asterisks. *This will be emphasized*
  • For bold text, use two asterisks. **This will be in bold**
  • Headings are represented by the pound sign / pound mark (#). The text is separated from the hash by a space. Use one hash for a top-level header, two for a second-level, and so on.
  • To create a bulleted list, start each line in the list with an asterisk and insert a space before the text.
  • To create a numbered list, start each line with a digit followed by a period, and then insert a space before the text.
  • To create a hyperlink, enclose the site name in square brackets ([]) and the URL in parentheses [()] like: [Link to How to Geek](https://www.howtogeek.com/).
  • To insert an image, type an exclamation point immediately before the brackets (![]). Write any alternative text for the image in brackets. Then enclose the path to the image in parentheses [()“]. Here is an example: ![The Geek](HTG.png).

We’ll cover more examples of all of these in the next section.

RELATED: What is Markdown and how is it used?

File conversion

File conversions are easy. pandoc You can usually determine which file formats you are working with from their file names. Here, we are going to generate an HTML file from a Markdown file. The -o (output) option says pandoc the name of the file we want to create:

pandoc -o sample.html sample.md

Our sample Markdown file, sample.md, contains the short Markdown section shown in the image below.

Markdown text in the sample.md file in a gedit editor window.

A file called sample.html is created. When we double-click on the file, our default browser will open it.

HTML representation of the markdown file sample.md, in a browser window.

Now, let’s generate a Open document format text document that we can open in LibreOffice Writer:

pandoc -o sample.odt sample.md

The ODT file has the same content as the HTML file.

An ODT document rendered from markdown and opened in LibreOffice Writer.

A clean touch is the alt text in the image that is also used to automatically generate a caption for the figure.

An automatically generated figure legend in LibreOffice Writer.

Specify file formats

The -f (of) and -t (a) options are used to say pandoc what file formats you want to convert. This can be useful if you are working with a file format that shares a file extension with other related formats. For instance, Texas, and Latex both use the extension “.tex”.

We are also using the -s option (standalone) so pandoc will generate all the LaTeX preamble necessary for a document to be a complete, self-contained, and well-formed LaTeX document. Without him -s (standalone), the output would still be well formed LaTeX that could be inserted into another LaTeX document, it would not parse properly as a standalone LaTeX document.

We write the following:

pandoc -f markdown -t latex -s -o sample.tex sample.md

If you open the file “sample.tex” in a text editor, you will see the generated LaTeX. If you have a LaTeX editor, you can open the TEX file to see a preview of how LaTeX typesetting commands are interpreted. Shrinking the window to fit the image below made the screen appear narrow, but it was actually fine.

A LaTeX file open in Texmaker, showing a preview of the compound page.

We use a LaTeX editor called Texmaker. If you want to install it on Ubuntu, type the following:

sudo apt-get install texmaker

In Fedora, the command is:

sudo dnf install texmaker

In Manjaro, use:

sudo pacman -Syu texmaker

Convert files with templates

You are probably beginning to understand the flexibility that pandoc provides. You can write once and post in almost any format. It’s quite a feat, but the docs look a bit vanilla.

With templates, you can dictate which styles pandoc used when generating documents. For example, you can say pandoc use the styles defined in a Cascading style sheets (CSS) file with the --css option.

We have created a small CSS file that contains the text below. Change the space above and below the level heading style. It also changes the text color to white and the background color to a shade of blue:

h1 {
  color: #FFFFFF;
  background-color: #3C33FF;
  margin-top: 0px;
  margin-bottom: 1px;
}

The full command is below; note that we also use the standalone option (-s):

pandoc -o sample.html -s --css sample.css sample.md

pandoc uses the unique style of our minimalist CSS file and applies it to the level one header.

HTML rendered from markdown with a CSS style applied to the level one header, in a browser window

Another wrapping option available to you when working with HTML files is to include HTML markup in your Markdown file. This will be passed into the generated HTML file as standard HTML markup.

However, this technique should be reserved for when you are only generating HTML output. If you work with multiple file formats, pandoc it will ignore HTML markup for non-HTML files and will be passed to them as text.

We can also specify which styles are used when generating ODT files. Open a blank LibreOffice Writer document and adjust the heading and font styles to suit your needs. In our example, we also add a header and footer. Save your document as “odt-template.odt”.

Now we can use this as a template with the --reference-doc option:

pandoc -o sample.odt --reference-doc=odt-template.odt sample.md

Compare this to the ODT example above. This document uses a different font, has colored headers, and includes headers and footers. However, it was generated from exactly the same Markdown file “sample.md”.

An ODT file rendered from a markup with a LibreOffice document acting as a stylesheet, in a LibreOffice Writer window.

Reference document templates can be used to indicate different stages in the production of a document. For example, you may have templates that have “Draft” or “For Review” watermarks. A template without a watermark would be used for a finished document.

Generating PDF files

Default, pandoc uses the LaTeX PDF engine to generate PDF files. The easiest way to ensure that the proper LaTeX dependencies are satisfied is to install a LaTeX editor, such as Texmaker.

However, it is quite a large installation: Tex and LaTeX are quite heavy. If your hard drive space is limited, or if you know that you will never use TeX or LaTeX, you may prefer to generate an ODT file. Then you can open it in LibreOffice Writer and save it as PDF.

Documents as code

Using Markdown as a writing language has several benefits, including the following:

  • Working in plain text files is fast: They load faster than word processor files of similar size, and they also tend to move faster through the document. Many editors, including gedit , Vim , and Emacs, use syntax highlighting with Markdown text.
  • You will have a timeline of all versions of your documents: If you store your documentation in a VCS, like Git, you can easily see the differences between two versions of the same file. However, this only really works when the files are plain text, as that is what a VCS expects to work with.
  • A VCS can record who made changes and when: This is especially useful if you often collaborate with others on large projects. It also provides a central repository for the documents themselves. Many cloud-hosted Git services, such as GitHub, GitLab, and BitBucket, they have free tiers in their pricing models.
  • You can generate your documents in multiple formats: With just a couple of simple shell scripts, you can extract the styles from CSS and reference documents. If you store your documents in a VCS repository that integrates with Continuous integration and continuous deployment (CI / CD), can be generated automatically each time the software is created.

RELATED: What is GitHub and what is it used for?

Final thoughts

There are many more options and functions inside pandoc of what we have covered here. The conversion processes for most file types can be modified and adjusted. For more information, check out the great examples in the official (and extremely detailed) post pandoc website.

Leave a Reply