Colorful tables in a terminal

It all started when I wanted to have significant p-values shown on the terminal colored in red. The R terminal is capable of showing colors, simple formatting (like italics or bold) and Unicode characters, thanks to the actual terminal that does the job of displaying R output – whether it is the console of rstudio or a terminal window. You can see that when you use tibbles from tidyverse: they use some very limited formatting (like showing “NA” in red).

I ended up writing a new package, colorDF. The package defines a new class of data frames, but it really does not change their behavior – just the way they are shown (specifically, it modifies some attributes and introduces a print.colorDF function for printing). If you change a tibble to a colorDF, it will still behave exactly like a tibble, but it will be shown in color:

# Color data frame 6 x 87: # (Showing rows 1 - 20 out of 87) │name │height│mass │birth_year│gender │probability 1 Luke Skywalker 172 77 19male 0.0083 2 C-3PO 167 75 112NA 0.0680 3 R2-D2 96 32 33NA 0.0596 4 Darth Vader 202 136 42male 0.0182 5 Leia Organa 150 49 19female 0.0138 6 Owen Lars 178 120 52male 0.0115 7 Beru Whitesun lars 165 75 47female 0.0489 8 R5-D4 97 32 NANA 0.0040 9 Biggs Darklighter 183 84 24male 0.0954 10 Obi-Wan Kenobi 182 77 57male 0.0242 11 Anakin Skywalker 188 84 42male 0.0066 12 Wilhuff Tarkin 180 NA 64male 0.0605 13 Chewbacca 228 112 200male 0.0587 14 Han Solo 180 80 29male 0.0519 15 Greedo 173 74 44male 0.0204 16Jabba Desilijic Tiure 175 1358 600hermaphrodite0.0929 17 Wedge Antilles 170 77 21male 0.0457 18 Jek Tono Porkins 180 110 NAmale 0.0331 19 Yoda 66 17 896male 0.0931 20 Palpatine 170 75 82male 0.0012

Yes, it looks like that in the terminal window!

You can read all about it in the package vignette (please use the package from github, the CRAN version is lagging behind). Apart from the print function, I implemented also a summary function which is more informative than the default summary function for the data frames.

starwars %>% as.colorDF %>% summary
# Color data frame 5 x 13: │Col │Class│NAs │unique│Summary 1name <chr> 0 87All values unique 2height <int> 6 45 66 [167 <180> 191] 264 3mass <dbl> 28 38 15.0 [ 55.6 < 79.0> 84.5] 1358.0 4hair_color<chr> 5 12none: 37, brown: 18, black: 13, white: 4, blond: 3, auburn: 1, … 5skin_color<chr> 0 31fair: 17, light: 11, dark: 6, green: 6, grey: 6, pale: 5, brown… 6eye_color <chr> 0 15brown: 21, blue: 19, yellow: 11, black: 10, orange: 8, red: 5, … 7birth_year<dbl> 44 36 8 [ 35 < 52> 72] 896 8gender <chr> 3 4male: 62, female: 19, none: 2, hermaphrodite: 1 9homeworld <chr> 10 48Naboo: 11, Tatooine: 10, Alderaan: 3, Coruscant: 3, Kamino: 3, … 10species <chr> 5 37Human: 35, Droid: 5, Gungan: 3, Kaminoan: 2, Mirialan: 2, Twi'l… 11films <lst> 0 24Attack of the Clones: 40, Revenge of the Sith: 34, The Phantom … 12vehicles <lst> 0 11Imperial Speeder Bike: 2, Snowspeeder: 2, Tribubble bongo: 2, A… 13starships <lst> 0 17Millennium Falcon: 4, X-wing: 4, Imperial shuttle: 3, Naboo fig…

For numeric vectors, by default the function shows the minimum, quartiles and median, but it can also produce a boxplot-like graphical summary. Since the function works also on lists, implementing a text terminal based boxplot function was super easy:

term_boxplot(Sepal.Length ~ Species, data=iris, width=90)
# Color data frame 5 x 4: │Col │Class│NAs │unique│Summary 1setosa <dbl> 0 15╾──────┤ + ├────────╼ 2versicolor<dbl> 0 21 ╾─────────┤ + ├──────────╼ 3virginica <dbl> 0 21 ╾──────────────────┤ + ├──────────────╼ 4Range <chr> 0 1Only one value: Range: 4.3 - 7.9

Cool, isn’t it?

rmd syntax highlighting in vim

Quick and dirty Rmarkdown syntax highlighting for vim, when you don’t have time install bundles, plugins or dependencies: go to this github repository, download the file rmd.vim from the syntax/ directory and copy it to ~/.vim/syntax/. This should work in 99,99% of the cases.

Custom comparison function for sorting data in R

Many languages allow you to use a custom comparison function in sorting. R is not an exception, but it is not entirely straightforward – it requires you to define a new class and overload certain operators. Here is how to do it.

Consider the following example. You have a certain number of paired values, for example

v <- list(a=c(2,1), b=c(1,3), d=c(1,1), e=c(2,3))

The job is to order these pairs in the following way. Given two pairs, p1=(x1, y1) and p2=(x2, y2), p1 < p2 iff one of the following conditions is fulfilled: either x1 < x2 and y1 <= y2, or x1 <= x2 and y1 < y2. The point is that if we draw lines, where one end of the line is at the height x1, and the other end is at the height y1, we want to sort these lines only if they do not cross — at most, only if one of their ends overlaps (but not both, because then the lines would be identical):

On the figure above, left panel, p1 < p2, because one of the ends is below the end of the other line (x1 < x2 and y1=y2). Of course, if y1 < y2 the inequality still holds. On the other hand, the right panel shows a case where we cannot resolve the comparison; the lines cross, so we should treat them as equal.

If now we have a list of such pairs and want to order it, we will have a problem. Here is the thing: the desired order is {d, a, b, e}. The element d=(1,1) is clearly smaller (as defined above) than all the others. However, b=(1,3) is not smaller than a=(2,1), and a is not smaller than b; that means, that a is equal to b, and their order should not be modified.

There is no way to do that with regular tools such as order, especially since x and y may not only be on different scales — they might be even completely different data types! One might be a numeric vector, the other a character string. Or, possibly, a type of requisite from Monty Python (with a defined relation stating that a banana is less than a gun). We must use a custom comparator.

For this, we need to notice that the R functions sort and order rely on the function xtfrm. This in turns relies on the methods ==, &gt; and [, defined for a given class. For numeric vectors, for example, these give what you would expect.

Our v vector is a list with elements which are pairs of numbers. For this type of data, there is no comparison defined; and comparing two pairs of numbers results with a vector of two logical numbers, which is not what we want.

> v[1] < v[2]
Error in v[1] < v[2] : comparison of these types is not implemented
> v[[1]] < v[[2]]

R, however, is an object oriented language (even if it does not always feel like that). Comparisons (“, ==) are generic functions and it is possible to define (or redefine) them for any class of objects. So here is the plan: we invent a new class for the object v, and define custom comparisons for the elements of this class of objects. Remember that if we define a function which name consists of a generic (like "plot" or "["), a dot, and a name of the class, we are defining the method for the given class:

## make v an object of class "foo"
class(v) <- "foo"

## to use the "extract" ([) method, 
## we need to momentarily change the class of x, because 
## otherwise we will end up in an endless loop
'[.foo' <- function(x, i) {
    class(x) <- "list"
    x <- x[i]
    class(x) <- "foo"

## define ">" as stated above
## the weird syntax results from the fact that a and b
## are lists with one element, this element being a vector 
## of a pair of numbers
'>.foo' <- function(a,b) {
a <- a[[1]]
b <- b[[1]]
ifelse( (a[1] > b[1] && a[2] >= b[2])
        (a[1] >= b[1] && a[2] > b[2]), TRUE, FALSE)

## if we can't find a difference, then there is no difference
'' <- function(a, b) 
    ifelse(a > b || b > a, FALSE, TRUE)

## we don't need that, but for the sake of completeness...
'<.foo' <- function(a, b) b > a

This will now do exactly what we want:

> v["a"] == v["b"]
[1] TRUE
> v["a"] > v["d"]
[1] TRUE
> sort(v)
[1] 1 1

[1] 2 1

[1] 1 3

[1] 2 3

[1] "foo"

Adding figure labels (A, B, C, …) in the top left corner of the plotting region

I decided to submit a manuscript using only R with knitr, pandoc and make. Actually, it went quite well. Certainly, revisions of manuscript with complex figures did not require much of manual work once the R code for the figures has been created. The manuscript ended up as a Word file (for the sake of co-authors), looking no different than any other manuscript. However, you can look up precisely how all the figures have been generated and, with a single command, re-create the manuscript (with all figures and supplementary data) after you changed a parameter.

One of the small problems I faced was adding labels to pictures. You know — like A, B, C… in the top right corner of each panel of a composite figure. Here is the output I was striving at:

Doing it proved to be more tedious than I thought at first. By default, you can only plot things in the plotting region, everything else gets clipped — you cannot put arbitrary text anywhere outside the rectangle containing the actual plot:

text(-20, 0, "one two three four", cex=2)

This is because the plotting are is the red rectangle on the figure below, and everything outside will not be shown by default:

One can use the function mtext to put text on the margins. However, there is no simple way to say “put the text in the top left corner of the figure”, and the results I was able to get were never perfect. Anyway, to push the label really to the very left of the figure region using mtext, you first need to have the user coordinate of that region (to be able to use option ‘at’). However, if you know these coordinates, it is much easier to achieve the desired effect using text.

First, we need to figure out a few things. To avoid clipping of the region, one needs to change the parameter xpd:


Then, we need to know where to draw the label. We can get the coordinates of the device (in inches), and then we can translate these to user coordinates with appropriate functions:

di <- dev.size("in")
x <- grconvertX(c(0, di[1]), from="in", to="user")
y <- grconvertY(c(0, di[2]), from="in", to="user")

x[1] and y[2] are the coordinates of the top left corner of the device… but not of the figure. Since we might have manipulated the layout, for example, by calling par(mfrow=...) or layout to put multiple plots on the device, and we would like to always label the current plot only (i.e. put the label in the corner of the current figure, not of the whole device), we have to take this into account as well:

fig <- par("fig")
x <- x[1] + (x[2] - x[1]) * fig[1:2]
y <- y[1] + (y[2] - y[1]) * fig[3:4]

Before plotting, we have to adjust this position by half of the text string width and height, respectively:

txt <- "A"
x <- x[1] + strwidth(txt, cex=3) / 2
y <- y[2] - strheight(txt, cex=3) / 2
text(x, y, txt, cex=3)

Looks good! That is exactly what I wanted:

Below you will find an R function that draws a label in one of the three regions — figure (default), plot or device. You specify the position of the label using the labels also used by legend: “topleft”, “bottomright” etc.

First, a few examples how to use it:

Basic use:

sapply(LETTERS[1:4], function(x) { 
    fig_label(x, cex=2) 


Plotting at different positions and in different regions:

for(i in c("topleft", "topright", "top", 
           "left", "center", "right", 
           "bottomleft", "bottom", "bottomright")) {
    fig_label(i, pos=i, cex=2, col="blue")
    fig_label(i, pos=i, cex=1.5, col="red", region="plot")


All the different regions:

sapply(LETTERS[1:4], function(x) { 
    fig_label("figure region", cex=2, col="red") 
    fig_label("plot region", region="plot", cex=2, col="blue")
fig_label("device region", cex=2, pos="bottomright", 
  col="darkgreen", region="device")


And here is the function:

fig_label <- function(text, region="figure", pos="topleft", cex=NULL, ...) {

  region <- match.arg(region, c("figure", "plot", "device"))
  pos <- match.arg(pos, c("topleft", "top", "topright", 
                          "left", "center", "right", 
                          "bottomleft", "bottom", "bottomright"))

  if(region %in% c("figure", "device")) {
    ds <- dev.size("in")
    # xy coordinates of device corners in user coordinates
    x <- grconvertX(c(0, ds[1]), from="in", to="user")
    y <- grconvertY(c(0, ds[2]), from="in", to="user")

    # fragment of the device we use to plot
    if(region == "figure") {
      # account for the fragment of the device that 
      # the figure is using
      fig <- par("fig")
      dx <- (x[2] - x[1])
      dy <- (y[2] - y[1])
      x <- x[1] + dx * fig[1:2]
      y <- y[1] + dy * fig[3:4]

  # much simpler if in plotting region
  if(region == "plot") {
    u <- par("usr")
    x <- u[1:2]
    y <- u[3:4]

  sw <- strwidth(text, cex=cex) * 60/100
  sh <- strheight(text, cex=cex) * 60/100

  x1 <- switch(pos,
    topleft     =x[1] + sw, 
    left        =x[1] + sw,
    bottomleft  =x[1] + sw,
    top         =(x[1] + x[2])/2,
    center      =(x[1] + x[2])/2,
    bottom      =(x[1] + x[2])/2,
    topright    =x[2] - sw,
    right       =x[2] - sw,
    bottomright =x[2] - sw)

  y1 <- switch(pos,
    topleft     =y[2] - sh,
    top         =y[2] - sh,
    topright    =y[2] - sh,
    left        =(y[1] + y[2])/2,
    center      =(y[1] + y[2])/2,
    right       =(y[1] + y[2])/2,
    bottomleft  =y[1] + sh,
    bottom      =y[1] + sh,
    bottomright =y[1] + sh)

  old.par <- par(xpd=NA)

  text(x1, y1, text, cex=cex, ...)

Using external data from within another package

If you make the error which I did, you will try to use the data (say, “pckgdata”) from another package (say, “pckg”) naively like this:

someFunc <- function() {
  foo <- pckgdata$whatever

This will result in an error:

someFunc: no visible binding for global variable ‘pckgdata’
someFunc : <anonymous>: no visible binding for global variable
Undefined global functions or variables:

Here is the solution (thanks to the comments from stackexchange:

.myenv <- new.env(parent=emptyenv())

someFunc <- function() {
  data("pckgdata", package="pckg", envir=".myenv")
  foo <- .myenv$pckgdata$whatever

Actually, let us load the object as soon as our package is loaded:

.myenv <- new.env(parent=emptyenv())

.onLoad <- function(libname, pkgname){
  data("pckgdata", package="pckg", envir=".myenv") 

someFunc <- function() {

  foo <- .myenv$pckgdata$whatever

Now any of the functions in our package can use the pckgdata, whenever. Note that we want to use .onLoad(), and not .onAttach() — the latter one is for such things as startup messages when the package is manually attached by the user.

Alternatively, you can create your environment within the function itself:

<br />someFunc <- function() {
  myenv <- new.env(parent=emptyenv())
  data("pckgdata", package="pckg", envir="myenv")
  foo <- .myenv$pckgdata$whatever

R-devel in parallel to regular R installation

Unfortunately, you need both: R-devel (development version of R) if you want to submit your packages to CRAN, and regular R for your research (you don’t want the unstable release for that).

Fortunately, installing R-devel in parallel is less trouble than one might think.

Say, we want to install R-devel into a directory called ~/R-devel/, and we will download the sources to ~/src/. We will first set up two environment variables to hold these two directories:

export RSOURCES=~/src
export RDEVEL=~/R-devel

Then we get the sources with SVN. In Ubuntu, you need package subversion for that:

mkdir -p $RSOURCES
svn co R-devel

Then, we compile R-devel. R might complain about missing developer packages with header files, in such a case the necessary package name must be guessed and the package installed (e.g. libcurl4-openssl-dev for Ubuntu when configure is complaining about missing curl):

mkdir -p $RDEVEL
$RSOURCES/R-devel/configure && make -j

That's it. Now we just need to set up a script to launch the development version of R:

export PATH="$RDEVEL/bin/:\$PATH"
export R_LIBS=$RDEVEL/library
R "$@"

You need to save the script in an executable file somewhere in your $PATH, e.g. ~/bin might be a good idea.

Here are commands that make this script automatically in ~/bin/Rdev:

cat <<EOF>~/bin/Rdev;

export R_LIBS=$RDEVEL/library
export PATH="$RDEVEL/bin/:\$PATH"
R "\$@"
chmod a+x ~/bin/Rdev

One last thing remaining is to populate the library with packages necessary for the R-devel to run and check the packages, in my case c("knitr", "devtools", "ellipse", "Rcpp", "extrafont", "RColorBrewer", "beeswarm", "testthat", "XML", "rmarkdown", "roxygen2" ) and others (I keep expanding this list while checking my packages). Also, bioconductor packages limma and, which I need for a package which I build.

Now I can check my packages with Rdev CMD build xyz / Rdev CMD check xyz_xyz.tar.gz

Presentations in (R)markdown

There are many ways to turn a markdown or Rmarkdown document into a presentation. Way too many, and none of them is perfect. I made my first presentation with knitr / Rmarkdown for the tmod package.

After trying various options in knitr, I decided on an approach in which the Rmarkdown document is oblivious of the presentation system and the job of turning it into a presentation is taken up by pandoc. There were several bumps and problems, and I will give now a step – by – step guide.

1. Input file

Let’s start with an example Rmd. In the following, I assume it has been saved under “test.Rmd”.

title: "Example presentation"
author: January Weiner 
date: "`r Sys.Date()`"

# First part
## Slide 1

```{r plot1}
plot(1:10, 1:10)

## Slide 2
Some maths: $sum_{i=1}^{N}$

# Second part
## Slide 3
... contents ...

2. From Rmarkdown to markdown

I use knitr only to create a markdown file.

Rscript -e 'knitr::knit("test.Rmd")'

This produces the file With that, knitr’s job is finished, we will not need it anymore.

3. Download reveal.js

I decided for reveal.js. It was easy to work with and adapt to my needs, it had elegant default themes, it has a low footprint and shortcuts. And it has the “2D” layout, meaning that sections (level one headers) are arranged horizontally, while slides within one section are arranged vertically. Pressing “Esc” in a presentation shows the slide overview:


Anyway, download reveal.js and unpack it in the same directory as

Making the presentation

Use pandoc to create the reveal.js presentation. Note that this is not the final command line; in the following points I will discuss the problems which will influence the final version.

pandoc -s -S -t revealjs --mathjax -o test.html

4. MathJax

On slide 2, we have a bit of maths. The maths is written in a LaTeX-like notation, and there are many ways to turn it into an elegant mathematical equation on the final presentation. I have tried many options with pandoc, and found that only MathJax works properly and without a major hassle. This is why on the previous command line I used the option --mathjax.

However, if you run the above command line, you will notice that on “Slide 2”, the maths doesn’t work, despite using the ‘–mathjax’ option. It would work, though, if we put the file on a server. The reason is that pandoc puts the URL to MathJax in the form ‘src=”//cdn.mathjax…”‘. This assumes the context of how we opened the file. If we opened it from a server, using http or https, this would have worked. If we open it directly in a browser, it uses “file://cdn.mathjax…” which is obviously not on our file system. We have two options.

4.1 External MathJax

Use the command line

pandoc -s -S -t revealjs --mathjax="" -o test.html

This works unless we have no Internet access, for example because we show our presentation in another institute, where our laptop cannot connect to the Internet, because then we are screwed.

4.2 Local MathJax

Alternatively, you can download the whole MathJax:

mv MathJax-2.5-latest/ MathJax

and specify the local installation with the following command line:

pandoc -s -S -t revealjs --mathjax="MathJax/MathJax.js?config=TeX-AMS-MML_HTMLorMML" -o test.html

This works, but our presentation has suddenly over 170 megabytes. Which sucks.

5. 2D layout and section headers

I mentioned previously that reveal.js allows a neat 2D layout, in which slides from one section are arranged vertically, and sections are put next to each other. However, sections with only a title and no contents might be a bit boring, so let us modify the .md file changing the second section as follows:

# Second part

This is the second part, even more interesting.

## Slide 3
... contents ...

You run pandoc again, and…


Huh, where is the 2D layout gone? Why are all slides next to each other? Why are all slides from one section all on one single slide?

Pandoc automatically guesses which level header denotes boundaries between slides. It defines “slide level” as “the highest level followed immediately by non-header contents”. After our modification, the top level header (starting with a single #) became the level at which slides are separated. OK, so maybe we try specifying the slide level manually?

pandoc -s -S -t revealjs --mathjax="MathJax/MathJax.js?config=TeX-AMS-MML_HTMLorMML" -o test.html

OK, this works, but… the contents under the first level header (“This is the second part…”) is gone! This is because “Headers above the slide level in the hierarchy create “title slides,” which just contain the section title and help to break the slide show into sections.”

Turns out that there is no way we can have both: 2D with slides divided neatly into sections, and section slides which contain more than just a title. Not if we use pandoc, that is.

6. Modifying the layout

6.1 reveal.js theme

This is the easiest part: pick one of the existing reveal.js themes (I omit the mathjax command line for simplicity sake, do remember to put it back in):

pandoc -s -S -t revealjs -o test.html -V theme=blood

Note that the themes listed on the reveal.js website start with a capital letter, but you must specify a lowercase letter in the above command line.

6.2 Fine tuning the theme

I did not like the sans-serif, capitalized and decorated fonts of the blood theme (shadows on titles, I beg you). Ugly. However, if you know a little CSS (and you’d better learn it!), you can easily adapt it to your needs.

Look up the file reveal.js/css/theme/blood.css for hints and create your own CSS file (let us call it test.css) in the same directory as In the file below, I reset all the ugly decorations and set two fonts for headers and body, respectively: Garamond for headers, and Quattrocento Sans for body, using the google fonts service:

@import url('');
@import url('');

.reveal {
  font-size: 32px;
  font-family: 'Quattrocento Sans', 'sans-serif'; }

.reveal h1, .reveal h2, .reveal h3, .reveal h4, .reveal h5, .reveal h6 {
  font-family: 'EB Garamond', 'serif';
  text-transform: none;
  text-shadow: none; }

.reveal h1 { font-size: 2em; }
.reveal h2 { font-size: 1.7em; }
.reveal h3 { font-size: 1.4em; }
.reveal h4 { font-size: 1em; }

Also, as you might notice, I prefer smaller fonts here. We integrate our test.css file with the following option

pandoc -s -S -t revealjs -o test.html -V theme=blood --css test.css

6.3 Adding a logo

You can add a logo (or whatever other background for your slides) by modifying the CSS file test.css. If logo.png is the name of your logo, adding this to your CSS will put it on all your slides in the top left corner:

body {
  background-image: url(logo.png);
  background-repeat: no-repeat;
  background-position:20px 20px;

6.4 Better syntax highliting

Pandoc’s syntax highlighting doesn’t look good on a dark background. You can add the following to the “test.css” file to reproduce the Solarized theme.

.reveal pre code { color: #839496; 
  background-color: #2B2B2B; } /* use #FDF6E3 for light background */

.sourceCode .kw { color: #268BD2; }
.sourceCode .dt { color: #268BD2; }
.sourceCode .dv, .sourceCode .bn, .sourceCode .fl { color: #D33682; }
.sourceCode .ch { color: #DC322F; }
.sourceCode .st { color: #2AA198; }
.sourceCode .co { color: #93A1A1; }
.sourceCode .ot { color: #A57800; }
.sourceCode .al { color: #CB4B16; font-weight: bold; }
.sourceCode .fu { color: #268BD2; }
.sourceCode .re { }
.sourceCode .er { color: #D30102; font-weight: bold; }

# 7. Creating a PDF of your presentation

Of course you need a PDF for printing and as a backup.

There are two ways for producing PDF from reveal.js. Each one is imperfect. 

## 7.1 Creating PDF using pandoc

Since the `` file is a generic markup, we can turn it into a simple PDF

pandoc -s -S -o test.pdf

Or even beamer presentation:

pandoc -s -S -t beamer -o test.pdf

Unfortunately, this is not so nice as our presentation, and completely ignores whatever we have put in the CSS.

7.2 Using the reveal.js printing facility and Google Chrome

The second way is interactive only (you cannot create the PDF with a command line). Open the file in google chrome and add ?print-pdf to the file URL, such that the end of the URL reads test.html?print-pdf.

The output looks garbled: the slides overlap. Don’t worry, it’s OK. Open the print dialog (press Ctrl-P), and you will see that now the output is correct. You can save it as PDF or send it to a printer.

8. The final command line

pandoc -s -S -t revealjs --mathjax=""  -V theme=blood --css test.css -o test.html

Kneat tricks

So I have finally switched to knitr for doing my vignettes. The result is satisfactory, but the process was not entirely painless.

  • The command to run instead of “R CMD Sweave foo.Rnw” is

    Rscript -e 'rmarkdown::render("foo.rmd")'

  • I think that the concept of writing a package which has the main purpose to generate documentation in literate programming without providing mandatory documentation (such as list of options) within the package itself, referring instead to the online resources is beautifully subversive.

  • Knitr in the current R version requires pandoc X.Y.Z, while Ubuntu has X.Y.(Z-1). It was necessary to download the deb package from the pandoc site and install it manually.

  • To use knitr in vignettes, you need to add `VignetteBuilder:knitr` to your `DESCRIPTION` file.

  • I was confused at first as to what to do the old vignette header (the lines that start with “%\Vignette…”). The markdown header is different. Turns out you have to include these lines in the markdown header (Kill me, but I have no idea why there is a “>” behind “vignette:” or “|” behind “abstract:”. Knitr produces neat results, but it is one of the most confusing packages I have ever encountered.):

                    title: "FOO: the fooly of foology"
                    author: "January Weiner"
                    date: "`r Sys.Date()`"
                    vignette: >
                    abstract: |
                      Foo foo foo foo. Foo foo, foo foo foo, foo.
                    toc: yes
                    bibliography: bibliography.bib

  • <>= becomes ```{r label, fig.width=5, fig.height=5}. Also, any character argument to options must be in quotes.

  • I have no idea why fig.width=5 works, but opt.chunk$set(fig.width=5) doesn’t and at this point I don’t care to ask.

  • I had a nightmarish forensic experience trying to figure out why my figures don’t get updated, where is the cache and some other things. Turns out that if you provide a symbolic link to an rmd file to knitr, it will change to the directory to where the original is. Which is not the same behavior as in the case of Sweave.

  • It turns out that some options are valid for HTML, but not PDF, and vice versa, and you don’t get a warning. Also, it’s not mentioned in the documentation. Why? Because f— you, that’s why. For example, I spent half an hour trying to change the theme of a PDF vignette, after which it turned out that the theme option is not valid for PDFs. There was a table somewhere showing which options can be used when, but I lost the link and can’t find it in the documentation.

  • I haven’t found out how to change the font size if generating pdf_document (my favorite). Update: I have found out that it is not possible.

  • Also, no idea how to prevent breaking code small chunks between pages, which really, really should not happen.

  • At first I specified the vignette engine to be knitr::knitr, but apparently this produces only (botched) HTML vignette (botched: no title, no author, no references). To generate neat, honest-to-Knuth PDF via pandoc and LaTeX, one should use knitr::rmarkdown, although that is not documented anywhere.


pandoc, markdown and pander

Pandoc + markdown seem to be a great way of documenting my work.

Markdown syntax is very simple and allows to add basic formatting and figures to an otherwise simple text document, without obfuscating the actual text.

Then I simply compile the document using the pandoc command:

pandoc -o document.docx
pandoc -o document.pdf

There are some more tricks, of course, and plenty of output formats are possible. One thing I was struggling with was that images in docx files were much too large. It turns out that the PNG graphics I generate from PDFs (which, in turn, come from R) lacked the information about density units. I was using the convert program from ImageMagick, and it turns out it is necessary to add the option -units PixelsPerInch:

convert -density 300 -units PixelsPerInch image.pdf image.png

Another thing that I found useful was the pander package. Of course, there is this whole science of generating dynamic documents and reports from R using Sweave or knittr, but at the moment I rather produce two files: a commented R pipeline and, separately, a report in markdown format.

(The reason for not using knittr is that I given that I work with some very large data sets that sometimes take ages to compute, I would have to work out the details of cacheing and handling code that takes a while to execute. Also, I want to have a document with all commands, for me, and report without any R code for everyone else).

Pander allows to create nice tables in R that can be directly copied and pasted to a markdown document (of course, pander is so much more, but this is my main use at the moment):

       emphasize.strong.cols=1, justify="left", 
       style="simple", digits=2, split.tables=Inf)

I was astonished how nice the resulting word file is. The PDFs, which are produced by TeX/LaTeX, I think, are actually more trouble, for example because LaTeX disregards my order of figures and tables, they are all floating objects and there is no easy way to change this from within the document.

Creating a graph with variable edge width in Rgraphviz

This was waaaaay more complicated than necessary. Figuring it out took me almost a whole day.

In essence, there is the graph package in R, which provides graph objects and methods, and there is the Rgraphviz package, which allows you to plot the graphs on the screen. They work well.

nodes <- c( LETTERS[1:3] )
edgesL <- list( A=c("B", "C"), B=c("A", "C"), C=c("B", "A" ) )
graph <- new( "graphNEL", nodes= nodes, edgemode="undirected", edgeL=edgesL )
rag <- agopen( graph, "" )

Here the output:


So far, so good. If I wanted to color the edge from A to C red, here is what I could do:

eAttrs <- list( color=c( "A~C"="red" ) )
rag <- agopen( graph, "", edgeAttrs=eAttrs )


The attribute “color” works well. The man page for AgEdge gives us other attributes, specifically, “lwd” which specifies the width of the edge. However, it is not possible to set the edge widths using the above method. I found that the following code works for me:

setEdgeAttr <- function( graph, attribute, value, ID1, ID2 ) {

  idfunc <- function(x) paste0( sort(x), collapse="~" )
  all.ids <- sapply( AgEdge(graph), 
    function(e) idfunc( c( attr(e, "head"), attr( e, "tail" ))))

  sel.ids <- apply(cbind( ID1, ID2 ), 1, idfunc )

  if(!all(sel.ids %in% all.ids)) stop( "only existing edges, please" )
  sel <- match( sel.ids, all.ids )

  for(i in 1:length(sel)) {
    attr( attr( graph, "AgEdge" )[[ sel[i] ]], attribute ) <- value[i]



rag <- agopen( graph, "" )
rag <- setEdgeAttr( rag, "lwd", c(5, 20), c("B", "B"), c( "A", "C" ) )


What a colossal waste of my time. However, I need a visualization with graphs; and it needs to take a custom node drawing function as an argument, so there.