A Few Notes About Pandoc

You’re soaking in it.

January 8, 2016


I have now done two projects with Pandoc – this web site and the Ten Steps to Linux Survival book. During the process I’ve come to a few personal “best practices” the hard way. These are not hard-and-fast rules, but instead guidelines I’ve found helpful while developing content that will be processed by Pandoc.

Beware Pass-Through Code

You can “pass through” LaTeX and HTML in the Markdown, and Pandoc will write it out in the appropriate output formats. This is very powerful. It’s a must for creating an index in PDFs, for example. It’s also required for effectively using Bootstrap in HTML output, like with this site. But use this feature sparingly. It’s not a crutch to whip out just because you know how to do something in HTML, but don’t want to take the time to figure it out in Pandoc (because then why are you using Pandoc?)

Pandoc-flavored Markdown (PFM) is incredibly powerful – use it fully. For example, you can pass along HTML ids, classes and any other attributes on headers and code blocks in PFM, and that goes a long way toward styling, linking and scripting. HTML ids are auto-generated for headers as well using the header text, which is useful for creating internal links with PFM that are understood in both HTML (and related formats like EPUB, etc.) and LaTeX (and related formats like PDFs).

Anywhere that you have multiple output formats (HTML, LaTeX) and you are trying to figure out how to use native code to produce the same effect for each output type, you’re almost certainly doing it wrong. That’s the whole point of Pandoc. Really try and live within “the Pandoc way” as much as possible.

Use the Include Files

It doesn’t become clear until you’ve been wielding it a while just how powerful the --include-in-header (HTML output), --include-before-body (HTML and LaTeX) and --include-after-body (HTML and LaTeX) options are. They do exactly what they say and allow you to include snippets (in this case of raw “code”) in those places in the document across multiple documents. This is where you can add JavaScript in HTML, or define new LaTeX macros, etc. Use them.

Note that with HTML output it is better to put CSS in separate files and have Pandoc generate the <link> statements in the <head> section by using the --css command line parameter, which you can repeat as many times as necessary.

Understand Variables and Metadata

There are various variables that can be filled in either from metadata files (YAML, XML, etc.) or from the command line with --variable and --metadata. Read about them and understand them, they control a lot of behavior. You can also define your own, and that can be very powerful (more on that in a bit).

Override Templates With Caution

You can access the default templates used to generate each output format with pandoc -D <outputtype>. You can then save that off and pass it into Pandoc using --template. Obviously that means you can change the templates to your heart’s content, and there are times and places for that. However, I have come to realize this option should be used sparingly. It is like passing through raw HTML or LaTeX in the PFM – if you are spending more time in the template than in the doc, you’re doing it wrong.

I started out with a heavily hacked template for my book, but ultimately was able to move all that into before and after files and then use just the default template. Simpler, and easier to upgrade when Pandoc changes (and in fact, that’s what motivated me to get my act together, when I updated Pandoc and it broke something in my heavily-modified template).

You can then have different build targets in your Makefile or equivalent so that when you’re building the HTML output it gets different before and after include files that are raw HTML, the LaTeX build target gets raw LaTeX, etc. Different build targets can get different variables, too. See the Makefile for my book for an example.

There can be times and places for this, though. Pandoc’s own web site uses a heavily-modified template, but in fact I don’t think it needs to – perhaps it did when it was first built, but with the feature set as it now stands I think that’s less true than it was. I started using a modified template based on that one for this web site, but was able to revert to a default template, with everything custom moved into before and after include files, plus some inline HTML for better use of the Bootstrap grid system.

The reason to run pandoc -D <outputtype> then is not to save it off and start hacking it. It is to study the (relatively simple) template files that are used, and understand how the different variables are used and where, and where the header, before and after include files are placed. For example, without even understanding Pandoc’s variable substitution syntax completely, anyone familiar with HTML should be able to read and comprehend the default HTML template in less than five minutes. It seems too simple to be powerful, but if you grok all the above, you can see that the power is there.

Format LaTeX Passthrough Code

It is not directly related to Pandoc or Markdown per se, but I will add another “best practice” (for me, at least), when it comes to building indexes for LaTeX (and hence PDF) docs. When passing through native commands like LaTeX, it is best to:

  1. Separate them out on their own lines.
  2. Put them at the end of the paragraph, not inline with the text.

Originally, while indexing I was trying to place the index markers next to the word or term they were indexing. This got unwieldy really quickly (in the following, \drcmd is a LaTeX macro I created, \index is a native LaTeX macro):

Any time I wanted to reflow that text those commands were in the way. They also really broke up the “flow” of the paragraph text itself, making it harder to read as “source code.” So now I put all those commands at the end of the paragraph:

I also break out each one on a line by itself, because it makes it easier when doing mass search-and-replace operations across files to see the effects.

There is some small chance that if a paragraph flowed across a page break the index would be pointing at the end of the paragraph on the second page, which may or may not contain the word(s) indexed, but I am willing to take that chance in exchange for the ease of use and better reading of the source text (we’ve all had that experience of an index pointing to a page, only to find the entry on the previous or next page – now I think I know why).

There is nothing that would stop you from placing them at the beginning of the paragraph, or perhaps placing the LaTeX macros applying to words nearer the beginning of the paragraph at the front and those applying to words more toward the end of the paragraph to be placed at the end, but I don’t want to work that hard.

For a brief period of time I tried a hybrid, of embedding the commands in the paragraph but having them on their own separate lines, but that was the worst of both worlds, as you can see:

Embedding native HTML is a bit different, because sometimes you’re wrapping whole sections in a <div> (especially if using Bootstrap), or applying a <span> with a style or some such, so those obviously have to be applied directly to the text involved. But that also goes back to my whole “use native markup sparingly” approach.


In some sense it’s natural to go down the rat hole of heavy customization first – the documentation talks about the include files and variables and all, but their power and use isn’t really clear until you’ve studied the templates and how they interact with the various metadata, variables and command line switches. You usually don’t get to that point until after you’ve banged your head for a while doing things the hard way with raw LaTeX/HTML in the PFM and heavily hacked templates and then thought, “This is hard, I must be doing something wrong.”

Pandoc is supposed to make your life easier. If it is turning hard, chances are, you’re doing it wrong. Stop and go re-read the docs and the templates for your chosen output formats and figure out the easy way.