Presentations in (R)markdown

There are many ways to turn a markdown or Rmarkdown document into a presentation. Way too many, and none of them is perfect. I made my first presentation with knitr / Rmarkdown for the tmod package.

After trying various options in knitr, I decided on an approach in which the Rmarkdown document is oblivious of the presentation system and the job of turning it into a presentation is taken up by pandoc. There were several bumps and problems, and I will give now a step – by – step guide.

1. Input file

Let’s start with an example Rmd. In the following, I assume it has been saved under “test.Rmd”.

---
title: "Example presentation"
author: January Weiner 
date: "`r Sys.Date()`"
---

# First part
## Slide 1
Code:

```{r plot1}
plot(1:10, 1:10)
```

## Slide 2
Some maths: $sum_{i=1}^{N}$

# Second part
## Slide 3
... contents ...

2. From Rmarkdown to markdown

I use knitr only to create a markdown file.

Rscript -e 'knitr::knit("test.Rmd")'

This produces the file test.md. With that, knitr’s job is finished, we will not need it anymore.

3. Download reveal.js

I decided for reveal.js. It was easy to work with and adapt to my needs, it had elegant default themes, it has a low footprint and shortcuts. And it has the “2D” layout, meaning that sections (level one headers) are arranged horizontally, while slides within one section are arranged vertically. Pressing “Esc” in a presentation shows the slide overview:

reveal_example

Anyway, download reveal.js and unpack it in the same directory as test.md.

Making the presentation

Use pandoc to create the reveal.js presentation. Note that this is not the final command line; in the following points I will discuss the problems which will influence the final version.

pandoc -s -S -t revealjs --mathjax -o test.html test.md

4. MathJax

On slide 2, we have a bit of maths. The maths is written in a LaTeX-like notation, and there are many ways to turn it into an elegant mathematical equation on the final presentation. I have tried many options with pandoc, and found that only MathJax works properly and without a major hassle. This is why on the previous command line I used the option --mathjax.

However, if you run the above command line, you will notice that on “Slide 2”, the maths doesn’t work, despite using the ‘–mathjax’ option. It would work, though, if we put the file on a server. The reason is that pandoc puts the URL to MathJax in the form ‘src=”//cdn.mathjax…”‘. This assumes the context of how we opened the file. If we opened it from a server, using http or https, this would have worked. If we open it directly in a browser, it uses “file://cdn.mathjax…” which is obviously not on our file system. We have two options.

4.1 External MathJax

Use the command line

pandoc -s -S -t revealjs --mathjax="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" -o test.html test.md

This works unless we have no Internet access, for example because we show our presentation in another institute, where our laptop cannot connect to the Internet, because then we are screwed.

4.2 Local MathJax

Alternatively, you can download the whole MathJax:

wget https://github.com/mathjax/MathJax/archive/v2.5-latest.zip
unzip v2.5-latest.zip
mv MathJax-2.5-latest/ MathJax

and specify the local installation with the following command line:

pandoc -s -S -t revealjs --mathjax="MathJax/MathJax.js?config=TeX-AMS-MML_HTMLorMML" -o test.html test.md

This works, but our presentation has suddenly over 170 megabytes. Which sucks.

5. 2D layout and section headers

I mentioned previously that reveal.js allows a neat 2D layout, in which slides from one section are arranged vertically, and sections are put next to each other. However, sections with only a title and no contents might be a bit boring, so let us modify the .md file changing the second section as follows:

# Second part

This is the second part, even more interesting.

## Slide 3
... contents ...

You run pandoc again, and…

reveal2

Huh, where is the 2D layout gone? Why are all slides next to each other? Why are all slides from one section all on one single slide?

Pandoc automatically guesses which level header denotes boundaries between slides. It defines “slide level” as “the highest level followed immediately by non-header contents”. After our modification, the top level header (starting with a single #) became the level at which slides are separated. OK, so maybe we try specifying the slide level manually?

pandoc -s -S -t revealjs --mathjax="MathJax/MathJax.js?config=TeX-AMS-MML_HTMLorMML" -o test.html test.md

OK, this works, but… the contents under the first level header (“This is the second part…”) is gone! This is because “Headers above the slide level in the hierarchy create “title slides,” which just contain the section title and help to break the slide show into sections.”

Turns out that there is no way we can have both: 2D with slides divided neatly into sections, and section slides which contain more than just a title. Not if we use pandoc, that is.

6. Modifying the layout

6.1 reveal.js theme

This is the easiest part: pick one of the existing reveal.js themes (I omit the mathjax command line for simplicity sake, do remember to put it back in):

pandoc -s -S -t revealjs -o test.html test.md -V theme=blood

Note that the themes listed on the reveal.js website start with a capital letter, but you must specify a lowercase letter in the above command line.

6.2 Fine tuning the theme

I did not like the sans-serif, capitalized and decorated fonts of the blood theme (shadows on titles, I beg you). Ugly. However, if you know a little CSS (and you’d better learn it!), you can easily adapt it to your needs.

Look up the file reveal.js/css/theme/blood.css for hints and create your own CSS file (let us call it test.css) in the same directory as test.md. In the file below, I reset all the ugly decorations and set two fonts for headers and body, respectively: Garamond for headers, and Quattrocento Sans for body, using the google fonts service:

@import url('http://fonts.googleapis.com/css?family=EB+Garamond');
@import url('http://fonts.googleapis.com/css?family=Quattrocento+Sans');

.reveal {
  font-size: 32px;
  font-family: 'Quattrocento Sans', 'sans-serif'; }

.reveal h1, .reveal h2, .reveal h3, .reveal h4, .reveal h5, .reveal h6 {
  font-family: 'EB Garamond', 'serif';
  font-weight:normal;
  text-transform: none;
  text-shadow: none; }

.reveal h1 { font-size: 2em; }
.reveal h2 { font-size: 1.7em; }
.reveal h3 { font-size: 1.4em; }
.reveal h4 { font-size: 1em; }

Also, as you might notice, I prefer smaller fonts here. We integrate our test.css file with the following option

pandoc -s -S -t revealjs -o test.html test.md -V theme=blood --css test.css

6.3 Adding a logo

You can add a logo (or whatever other background for your slides) by modifying the CSS file test.css. If logo.png is the name of your logo, adding this to your CSS will put it on all your slides in the top left corner:

body {
  background-image: url(logo.png);
  background-repeat: no-repeat;
  background-position:20px 20px;
}

6.4 Better syntax highliting

Pandoc’s syntax highlighting doesn’t look good on a dark background. You can add the following to the “test.css” file to reproduce the Solarized theme.

.reveal pre code { color: #839496; 
  background-color: #2B2B2B; } /* use #FDF6E3 for light background */

.sourceCode .kw { color: #268BD2; }
.sourceCode .dt { color: #268BD2; }
.sourceCode .dv, .sourceCode .bn, .sourceCode .fl { color: #D33682; }
.sourceCode .ch { color: #DC322F; }
.sourceCode .st { color: #2AA198; }
.sourceCode .co { color: #93A1A1; }
.sourceCode .ot { color: #A57800; }
.sourceCode .al { color: #CB4B16; font-weight: bold; }
.sourceCode .fu { color: #268BD2; }
.sourceCode .re { }
.sourceCode .er { color: #D30102; font-weight: bold; }
}

# 7. Creating a PDF of your presentation

Of course you need a PDF for printing and as a backup.

There are two ways for producing PDF from reveal.js. Each one is imperfect. 

## 7.1 Creating PDF using pandoc

Since the `test.md` file is a generic markup, we can turn it into a simple PDF

```bash
pandoc -s -S -o test.pdf test.md

Or even beamer presentation:

pandoc -s -S -t beamer -o test.pdf test.md

Unfortunately, this is not so nice as our presentation, and completely ignores whatever we have put in the CSS.

7.2 Using the reveal.js printing facility and Google Chrome

The second way is interactive only (you cannot create the PDF with a command line). Open the file in google chrome and add ?print-pdf to the file URL, such that the end of the URL reads test.html?print-pdf.

The output looks garbled: the slides overlap. Don’t worry, it’s OK. Open the print dialog (press Ctrl-P), and you will see that now the output is correct. You can save it as PDF or send it to a printer.

8. The final command line

pandoc -s -S -t revealjs --mathjax="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"  -V theme=blood --css test.css -o test.html test.md

Kneat tricks

So I have finally switched to knitr for doing my vignettes. The result is satisfactory, but the process was not entirely painless.

  • The command to run instead of “R CMD Sweave foo.Rnw” is

    Rscript -e 'rmarkdown::render("foo.rmd")'

  • I think that the concept of writing a package which has the main purpose to generate documentation in literate programming without providing mandatory documentation (such as list of options) within the package itself, referring instead to the online resources is beautifully subversive.

  • Knitr in the current R version requires pandoc X.Y.Z, while Ubuntu has X.Y.(Z-1). It was necessary to download the deb package from the pandoc site and install it manually.

  • To use knitr in vignettes, you need to add `VignetteBuilder:knitr` to your `DESCRIPTION` file.

  • I was confused at first as to what to do the old vignette header (the lines that start with “%\Vignette…”). The markdown header is different. Turns out you have to include these lines in the markdown header (Kill me, but I have no idea why there is a “>” behind “vignette:” or “|” behind “abstract:”. Knitr produces neat results, but it is one of the most confusing packages I have ever encountered.):

                    ---
                    title: "FOO: the fooly of foology"
                    author: "January Weiner"
                    date: "`r Sys.Date()`"
                    output: 
                      pdf_document:
                    vignette: >
                      %\VignetteIndexEntry{Foo}
                      %\VignetteKeyword{foo}
                      %\VignetteKeyword{foology}
                      %\VignetteEngine{knitr::rmarkdown}
                      %\SweaveUTF8
                      \usepackage[utf8](inputenc)
                    abstract: |
                      Foo foo foo foo. Foo foo, foo foo foo, foo.
                    toc: yes
                    bibliography: bibliography.bib
                    ---
     

  • <>= becomes ```{r label, fig.width=5, fig.height=5}. Also, any character argument to options must be in quotes.

  • I have no idea why fig.width=5 works, but opt.chunk$set(fig.width=5) doesn’t and at this point I don’t care to ask.

  • I had a nightmarish forensic experience trying to figure out why my figures don’t get updated, where is the cache and some other things. Turns out that if you provide a symbolic link to an rmd file to knitr, it will change to the directory to where the original is. Which is not the same behavior as in the case of Sweave.

  • It turns out that some options are valid for HTML, but not PDF, and vice versa, and you don’t get a warning. Also, it’s not mentioned in the documentation. Why? Because f— you, that’s why. For example, I spent half an hour trying to change the theme of a PDF vignette, after which it turned out that the theme option is not valid for PDFs. There was a table somewhere showing which options can be used when, but I lost the link and can’t find it in the documentation.

  • I haven’t found out how to change the font size if generating pdf_document (my favorite). Update: I have found out that it is not possible.

  • Also, no idea how to prevent breaking code small chunks between pages, which really, really should not happen.

  • At first I specified the vignette engine to be knitr::knitr, but apparently this produces only (botched) HTML vignette (botched: no title, no author, no references). To generate neat, honest-to-Knuth PDF via pandoc and LaTeX, one should use knitr::rmarkdown, although that is not documented anywhere.

    %\VignetteEngine{knitr::rmarkdown}

pandoc, markdown and pander

Pandoc + markdown seem to be a great way of documenting my work.

Markdown syntax is very simple and allows to add basic formatting and figures to an otherwise simple text document, without obfuscating the actual text.

Then I simply compile the document using the pandoc command:

pandoc -o document.docx document.md
pandoc -o document.pdf  document.md

There are some more tricks, of course, and plenty of output formats are possible. One thing I was struggling with was that images in docx files were much too large. It turns out that the PNG graphics I generate from PDFs (which, in turn, come from R) lacked the information about density units. I was using the convert program from ImageMagick, and it turns out it is necessary to add the option -units PixelsPerInch:

convert -density 300 -units PixelsPerInch image.pdf image.png

Another thing that I found useful was the pander package. Of course, there is this whole science of generating dynamic documents and reports from R using Sweave or knittr, but at the moment I rather produce two files: a commented R pipeline and, separately, a report in markdown format.

(The reason for not using knittr is that I given that I work with some very large data sets that sometimes take ages to compute, I would have to work out the details of cacheing and handling code that takes a while to execute. Also, I want to have a document with all commands, for me, and report without any R code for everyone else).

Pander allows to create nice tables in R that can be directly copied and pasted to a markdown document (of course, pander is so much more, but this is my main use at the moment):

pander(foo, 
       emphasize.strong.cols=1, justify="left", 
       style="simple", digits=2, split.tables=Inf)

I was astonished how nice the resulting word file is. The PDFs, which are produced by TeX/LaTeX, I think, are actually more trouble, for example because LaTeX disregards my order of figures and tables, they are all floating objects and there is no easy way to change this from within the document.

riverplot

Prompted by this cross-validated discussion, I have created the riverplot package. Here is a minimal gallery of the graphics produced by the package:

example

Here is an example which recreates the famous Minard plot:

minard

So, how to do these figures:

First, you need to create a specific riverplot object that can be directly plotted. (Use riverplot.example to generate an example object). Here, I show how to recreate the Minard plot using the provided data. makeRiver, the function that will create the object necessary for plotting, will use data frames to input information about nodes and edges, but we must use specific naming of the columns:

library( riverplot )
data( minard )
nodes <- minard$nodes
edges <- minard$edges
colnames( nodes ) <- c( "ID", "x", "y" )
colnames( edges ) <- c( "N1", "N2", "Value", "direction" )

Now we can add some style information to the “edge” columns, to mimick the orignal Minard plot:

# color the edges by troop movement direction
edges$col <- c( "#e5cbaa", "black" )[ factor( edges$direction ) ]
# color edges by their color rather than by gradient between the nodes
edges$edgecol <- "col"

# generate the riverplot object
river <- makeRiver( nodes, edges )

The makeRiver function reads any columns that match the style information (like colors of the nodes) and uses it to create the river object. The river object is just a simple list, you can easily view and manipulate it — or create it with your own functions. The point about makeRiver is to make sure that the data is consistent.

Once you have created a riverplot object with one of the above methods (or manually), you can plot it either with plot(x) or riverplot(x). I enforce the use of lines, and I also tell the plotting function to use a particular style. The default edges look curvy.

style <- list( edgestyle= "straight", nodestyle= "invisible" )
# plot the generated object
plot( river, lty= 1, default.style= style )
# Add cities
with( minard$cities, points( Longitude, Latitude, pch= 19 ) )
with( minard$cities, text( Longitude, Latitude, Name, adj= c( 0, 0 ) ) )

roxygen2: documenting R functions

From what I gathered on the roxygen2 package, it was perfect for me: documentation in Rd format generated automatically from the comments in the source files.

However, it took me a while to figure out how, exactly, does that work. The roxygen manual is rudimentary and technical; there is no vignette for roxygen2 and the vignette for roxygen is not applicable. I was lost and confused — so much for literate programming!

Still, once I understood how to proceed, it is truly a huge improvement in package creation. WRONG: not package creation; that was my first mistake. Roxygen does not help to maintain the package structure, and although it does update some of the package files apart from the manual pages, notably the NAMESPACE file, in general it is just for the sole purpose of maintaining the man pages. Which is a lot.

Creating the package

So, I had this source code file with several functions, and I wanted to build a package around it. First, I generated the package with the standard R package.skeleton function. I changed into the package directory, deleted the “Read-me-and-delete-me” file and modified the DESCRIPTION file (roxygen will not do it).

Second: Roxygen looks up the directory packagename/R and searches for source files which are there. So, if you want to create a man page for something, you need to have a corresponding source file. Pretty straightforward for functions that are already in the R directory. However, to create a man page for the package itself (i.e., the name-package.Rd file), one needs to create a dummy file called “name-package.R” in the R directory of the package. This file only contains the documentation in this commented, roxygen-specific format, and a single line of code containing a NULL. (that was actually one of these things which made me get stuck: in the roxygen vignette, an example is given with NA instead of NULL, which does not work, at least not in the recommended roxygen2. Also, command line examples don’t work).

To generate the manuals, start R in the parent directory of the package and run roxygenize( "packagedir" ). Presto, the manuals are there.

Using the roxygen tags to document the functions

Of course, you still need to edit the source files (including name-package.R) using the roxygen style formatting. This, fortunately, is straightforward. A block of roxygen code precedes the function to document; each line starts with #' (comment char + single quote). Several tags are necessary (notably the @export tag that indicates that the function should be exported in NAMESPACE, which I tend to forget), but this was simple to figure out.

The most important parts of the document — title, short description and details — can be included at the start of the roxygen block without any additional tags:

#' Hello world
#' 
#' The most boring program in the world
#'
#' This is so annoyingly boring that I don't even
#' know why I write it.
hello.world <- function( s="stuff it" ) 
    print( sprintf( "Go and %s, world", s ) )

So far, so good. However, this will not generate the manual: we need to mark the function for exporting. Also, we can add more tags: describes what the parameters do, who is the author, add an example.

#' Hello world
#' 
#' The most boring program in the world
#'
#' This is so annoyingly boring that I don't even
#' know why I write it.
#' @param s What the world should do and how
#' @export
#' @author God <bog@@niebo.org>
#' @examples
#' hello.world( "bugger yourself" )
hello.world <- function( s="stuff it" ) 
    print( sprintf( "Go and %s, world", s ) )

Note that @ has a special meaning for roxygen, and so it must be escaped (with another @, to be inconsistent, but practical).

Minimal example

To test this minimal example above, at least one has to do the following:

  • Create a valid DESCRIPTION file
  • Create the R directory (mkdir R)
  • Save the above code in a file in the R directory
  • Start R (in the directory where the DESCRIPTION and R reside)
  • Enter library( roxygen2 ) ; roxygenize( "." )

Roxygen will now produce a man directory in the current directory, and there, save a file called “hello.world.Rd”. This is the contents of the file:

\name{hello.world}
\alias{hello.world}
\title{Hello world}
\usage{
hello.world(s = "stuff it")
}
\arguments{
  \item{s}{What the world should do and how}
}
\description{
The most boring program in the world
}
\details{
This is so annoyingly boring that I don't even know why I
write it.
}
\examples{
hello.world( "bugger yourself" )
}
\author{
God ˂bog@niebo.org˃
}

(the < / &rt; look funny in the code above, but this is just a wordpress deficiency, the code is correct)

More good stuff

First, man pages without corresponding code. This comes in handy for all the data files and for a general package description.

You simply create an R file in the R/ directory in the source package, and call it, for example, “hello-package.R”. In this file, you write the regular roxygenize contents, but end it with a line containing NULL only:

#' Hello World Package
#' 
#' The most boring package in the world
#'
#' This is so annoyingly boring that I don't even
#' know why I write it.
NULL

This will create the manual page “hello-package.Rd” in the “man/” directory.

Links

Hadley Wickham, one of the authors of roxygen, published Advanced R Programming, an online book that includes, among other things, a tutorial for Roxygen; incomplete at best, but useful. Unfortunately, this is not easy to find and I still think that to merit a “literate programming bagde” one should document the roxygen package much better than it is documented at present.

pca3d

This is a little package that I have been using for a long time to visually explore results of PCA on grouped data. The main purpose was to have one simple command that would visualise a result of a PCA in R in 3D and color the data points by group and type.

For example, take the data set provided with the package, called “metabo” (it stems from my paper on metabolic profiling in tuberculosis):

library( pca3d )
data( metabo )

# top left bit of the metabo data frame
head( metabo )[,1:10] 

This last command shows the following output:

  group   X1   X2   X3   X4   X5   X6   X7   X8   X9
1   POS 0.78 1.10 1.26 0.87 0.68 0.65 0.72 0.77 0.88
2   POS 0.68 0.51 0.30 0.21 1.64 2.42 1.19 1.19 1.58
3   POS 1.00 1.31 1.68 1.08 2.46 1.19 1.02 1.82 1.60
4   POS 1.08 0.75 0.65 2.33 0.81 0.72 0.94 0.93 0.31
5    TB 0.87 0.81 0.99 0.85 0.92 0.69 1.12 1.50 0.70
6    TB 1.29 0.89 0.46 0.49 0.50 1.03 1.10 0.48 0.31

Each row corresponds to one serum sample either from TB patients or healthy controls. The first column of the data frame metabo are the group assignments; the remaining 423 columns correspond to relative levels of different small molecules (like sugars or amino acids) in the given serum sample. Running a PCA is straightforward:

pca <- prcomp( metabo[,-1], scale.= TRUE )

And visualisation with pca3d is straightforward as well:

pca3d( pca, group= metabo[,1] )

A 3D output (using the rgl package) is produced — you can interactively turn, zoom and change the perspective of the plot. Also, with the rgl.snapshot( filename ) command you can export the graphics as a PNG file.

Visualisation of the metabo PCA using pca3d.

Visualisation of the metabo PCA using pca3d.

You can very clearly see that the blue balls stand apart from the rest in the first two components. What are they? It is not easy to create a reasonable legend directly on an RGL canvas, but pca3d produces a text-only legend in the main text interface:

Legend:
----------------------------------------
       group:        color,        shape
----------------------------------------
         NEG:          red, tetrahaedron
         POS:       green3,         cube
          TB:         blue,       sphere

Oh, so the TB patients are really different from the rest! Neat. The really elegant thing about the PCA is that it does not use any information about the group classification. Therefore, whatever groups we see, they are real — the visualisation corresponds to an independent validation on the whole data set. This is very much unlike PLS, where the score plots always show a clear separation; PLS is eager to please as one author put it.

Unfortunately, 3D can only be saved as a PNG. However, for a publication, a 2D PDF might be more suitable. Another command in this package, pca2d takes exactly the same options as the pca3d command and produces a graphics on the standard R device:

2D -version of the previous plot.

2D -version of the previous plot.

There are plenty of other options to pca3d, for example show.labels can take a character vector as an argument and show a little text floating above every data point.

Furthermore, it is possible to create biplots. Unlike the normal biplot function, by default only a few variables are selected from each component (by their absolute loadings in that component) — if there are too many variables visualised, the figure is cluttered and useless.

The red arrows show selected variables

The red arrows show selected variables

In the above figure, several variables with high loadings can be seen.

Another plot, in which the cluster centroids are shown for all three groups of samples:

The large symbols indicate cluster centroids. Each sample is connected to the corresponding centroid.

The large symbols indicate cluster centroids. Each sample is connected to the corresponding centroid.

pca3d on CRAN: http://cran.r-project.org/web/packages/pca3d/