rmd syntax highlighting in vim

Quick and dirty Rmarkdown syntax highlighting for vim, when you don’t have time install bundles, plugins or dependencies: go to this github repository, download the file rmd.vim from the syntax/ directory and copy it to ~/.vim/syntax/. This should work in 99,99% of the cases.

Advertisements

Add a file to scan when completing in vim

OK, so this was an easy thing to find out, but enormously useful.

In vim, Ctrl-P and Ctrl-N allow to complete a word in insert mode. By default (see the complete option) vim scans the current buffer, buffers in other windows, included files etc. However, to add a specific file (in my case, bibliography which I like to have opened in another terminal) you need to add a k to the complete option, plus the location of the file:

:set complete+=k./bibliography.bib

Using caret

A caret call I frequently use. Given that x is training data and y response,

library(doMC)
registerDoMC(cores=6)

tc <- trainControl(method="repeatedcv", number=10, repeats=1, 
  returnData=TRUE, savePredictions="all", verboseIter=TRUE, classProbs=TRUE)
mod <- train(x=x, y=y, trControl=tc, method="rf",
  tuneGrid=data.frame(mtry=500))
  • library(doMC) and registerDoMC allow me to use more than one processor
  • repeatedcv: if more than one repeat of k-fold crossvalidation is requested, the repeated= parameter should be modified. repeatedcv must be used instead of cv
  • savePredictions: if we want to evaluate predictions on our own
  • verboseIter: to see the progress
  • classProbs: to report class probabilities, so we can use them to calculate ROC post factum
  • tuneGrid: if not specified, caret will tune parameters. Normally, we don’t want that

R, shiny and source()

This one cost me more time to figure out than it should have. The reason being, it turns out that I never properly understood what the source() function does.

So here is the story: I was setting up a shiny server for a student based on her code. She was running the shiny app from within RDesktop, and so before starting the app with runApp() she would load all necessary object and source() a file called helpers.R with some common calculations.

In order to put the app on a server, I have moved these pre-runApp() initializations into ui.R and server.R. Suddenly, weird errors appeared. The functions in the helpers.R no longer seemed to be able to find anything in the parent environment — object X not found! Even though I called source() immediately after loading the necessary objects into the environment:

# file server.R
load("myobjects.rda")
source("helpers.R")

The solution was, as usual, to read documentation. Specifically, documentation on source():

local   TRUE, FALSE or an environment, determining where the 
        parsed expressions are evaluated. FALSE (the default) 
        corresponds to the user's workspace (the global 
        environment) and TRUE to the environment from which 
        source is called.

The objects which I have load()-ed before were not in the global environment, but instead in another environment created by shiny. However, the expressions from helpers.R were evaluated in the global environment. Thus, a new function defined in helpers.R could be seen from inside server.R, but an object loaded from server.R could not be seen by helpers.R.

It is the first time that I have noticed this. Normally, I would use a file such as helpers.R only to define helper functions, and actually call them from server.R or ui.R. However, I was thinking that source() is something like #include in C, simply calling the commands in the given file as if they were inserted at this position into the code — or called from the environment from which source() was called.

This is not so.

Adding figure labels (A, B, C, …) in the top left corner of the plotting region

I decided to submit a manuscript using only R with knitr, pandoc and make. Actually, it went quite well. Certainly, revisions of manuscript with complex figures did not require much of manual work once the R code for the figures has been created. The manuscript ended up as a Word file (for the sake of co-authors), looking no different than any other manuscript. However, you can look up precisely how all the figures have been generated and, with a single command, re-create the manuscript (with all figures and supplementary data) after you changed a parameter.

One of the small problems I faced was adding labels to pictures. You know — like A, B, C… in the top right corner of each panel of a composite figure. Here is the output I was striving at:

Doing it proved to be more tedious than I thought at first. By default, you can only plot things in the plotting region, everything else gets clipped — you cannot put arbitrary text anywhere outside the rectangle containing the actual plot:

plot(rnorm(100))
text(-20, 0, "one two three four", cex=2)

This is because the plotting are is the red rectangle on the figure below, and everything outside will not be shown by default:

One can use the function mtext to put text on the margins. However, there is no simple way to say “put the text in the top left corner of the figure”, and the results I was able to get were never perfect. Anyway, to push the label really to the very left of the figure region using mtext, you first need to have the user coordinate of that region (to be able to use option ‘at’). However, if you know these coordinates, it is much easier to achieve the desired effect using text.

First, we need to figure out a few things. To avoid clipping of the region, one needs to change the parameter xpd:

par(xpd=NA)

Then, we need to know where to draw the label. We can get the coordinates of the device (in inches), and then we can translate these to user coordinates with appropriate functions:

plot(rnorm(100))
di <- dev.size("in")
x <- grconvertX(c(0, di[1]), from="in", to="user")
y <- grconvertY(c(0, di[2]), from="in", to="user")

x[1] and y[2] are the coordinates of the top left corner of the device… but not of the figure. Since we might have manipulated the layout, for example, by calling par(mfrow=...) or layout to put multiple plots on the device, and we would like to always label the current plot only (i.e. put the label in the corner of the current figure, not of the whole device), we have to take this into account as well:

fig <- par("fig")
x <- x[1] + (x[2] - x[1]) * fig[1:2]
y <- y[1] + (y[2] - y[1]) * fig[3:4]

Before plotting, we have to adjust this position by half of the text string width and height, respectively:

txt <- "A"
x <- x[1] + strwidth(txt, cex=3) / 2
y <- y[2] - strheight(txt, cex=3) / 2
text(x, y, txt, cex=3)

Looks good! That is exactly what I wanted:

Below you will find an R function that draws a label in one of the three regions — figure (default), plot or device. You specify the position of the label using the labels also used by legend: “topleft”, “bottomright” etc.

First, a few examples how to use it:

Basic use:

par(mfrow=c(2,2))
sapply(LETTERS[1:4], function(x) { 
    plot(rnorm(100))
    fig_label(x, cex=2) 
})

Result:

Plotting at different positions and in different regions:

plot(rnorm(100))
for(i in c("topleft", "topright", "top", 
           "left", "center", "right", 
           "bottomleft", "bottom", "bottomright")) {
    fig_label(i, pos=i, cex=2, col="blue")
    fig_label(i, pos=i, cex=1.5, col="red", region="plot")
}

Result:

All the different regions:

par(mfrow=c(2,2))
sapply(LETTERS[1:4], function(x) { 
    plot(rnorm(100))
    fig_label("figure region", cex=2, col="red") 
    fig_label("plot region", region="plot", cex=2, col="blue")
})
fig_label("device region", cex=2, pos="bottomright", 
  col="darkgreen", region="device")

Result:

And here is the function:

fig_label <- function(text, region="figure", pos="topleft", cex=NULL, ...) {

  region <- match.arg(region, c("figure", "plot", "device"))
  pos <- match.arg(pos, c("topleft", "top", "topright", 
                          "left", "center", "right", 
                          "bottomleft", "bottom", "bottomright"))

  if(region %in% c("figure", "device")) {
    ds <- dev.size("in")
    # xy coordinates of device corners in user coordinates
    x <- grconvertX(c(0, ds[1]), from="in", to="user")
    y <- grconvertY(c(0, ds[2]), from="in", to="user")

    # fragment of the device we use to plot
    if(region == "figure") {
      # account for the fragment of the device that 
      # the figure is using
      fig <- par("fig")
      dx <- (x[2] - x[1])
      dy <- (y[2] - y[1])
      x <- x[1] + dx * fig[1:2]
      y <- y[1] + dy * fig[3:4]
    } 
  }

  # much simpler if in plotting region
  if(region == "plot") {
    u <- par("usr")
    x <- u[1:2]
    y <- u[3:4]
  }

  sw <- strwidth(text, cex=cex) * 60/100
  sh <- strheight(text, cex=cex) * 60/100

  x1 <- switch(pos,
    topleft     =x[1] + sw, 
    left        =x[1] + sw,
    bottomleft  =x[1] + sw,
    top         =(x[1] + x[2])/2,
    center      =(x[1] + x[2])/2,
    bottom      =(x[1] + x[2])/2,
    topright    =x[2] - sw,
    right       =x[2] - sw,
    bottomright =x[2] - sw)

  y1 <- switch(pos,
    topleft     =y[2] - sh,
    top         =y[2] - sh,
    topright    =y[2] - sh,
    left        =(y[1] + y[2])/2,
    center      =(y[1] + y[2])/2,
    right       =(y[1] + y[2])/2,
    bottomleft  =y[1] + sh,
    bottom      =y[1] + sh,
    bottomright =y[1] + sh)

  old.par <- par(xpd=NA)
  on.exit(par(old.par))

  text(x1, y1, text, cex=cex, ...)
  return(invisible(c(x,y)))
}

More on reveal.js and pandoc

One of the problems I had with reveal.js was the interactive PDF exporting mode — not only you require google-chrome for that, there also is no way of easily automatizing that task.

It turns out that decktape.js is a good, command line solution. The only drawback is that it actually creates screenshots from a browser, so that the slides do not contain any text — they are just a bunch of screenshots! This makes the PDF huge and not searchable. Moreover, you really want the script to wait between the screenshots (by default one second, which makes the hole process slow), otherwise it creates screenshots of the transition, and the result does not look good.

On the up side, it looks exactly like the presentation.

There were two issues to install it in Ubuntu 14.04, though. First, it was necessary to install the libjpeg62 package, and second, it was necessary to install the gcc 4.9 compiler, which I did by using the toolchain ppa:

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-4.9 g++-4.9

Everything else went smooth.

Then I put phantomjs into ~/bin/, the decktape/ directory into ~/.local/share/, and wrote a little bash script to be able to call it easily from anywhere:

#!/bin/bash

PHANTOMJS=~/bin/phantomjs
DECKTAPE=~/.local/share/decktape/decktape.js
FILE=$1;shift
PDF=$1;shift

if [ -z "${FILE}" ] ; then
    cat <<EOF

Usage:
    ${0##*/}  [output file [options]]

decktape options:
EOF
  $PHANTOMJS $DECKTAPE -h
  exit 0
fi

if [ -z "$PDF" ] ; then PDF=${FILE%.*}.pdf ; fi

$PHANTOMJS $DECKTAPE "$@" "$FILE" "$PDF"

R-devel in parallel to regular R installation

Unfortunately, you need both: R-devel (development version of R) if you want to submit your packages to CRAN, and regular R for your research (you don’t want the unstable release for that).

Fortunately, installing R-devel in parallel is less trouble than one might think.

Say, we want to install R-devel into a directory called ~/R-devel/, and we will download the sources to ~/src/. We will first set up two environment variables to hold these two directories:

export RSOURCES=~/src
export RDEVEL=~/R-devel

Then we get the sources with SVN. In Ubuntu, you need package subversion for that:

mkdir -p $RSOURCES
cd $RSOURCES
svn co https://svn.r-project.org/R/trunk R-devel
R-devel/tools/rsync-recommended

Then, we compile R-devel. R might complain about missing developer packages with header files, in such a case the necessary package name must be guessed and the package installed (e.g. libcurl4-openssl-dev for Ubuntu when configure is complaining about missing curl):

mkdir -p $RDEVEL
cd $RDEVEL
$RSOURCES/R-devel/configure && make -j

That's it. Now we just need to set up a script to launch the development version of R:

#!/bin/bash
export PATH="$RDEVEL/bin/:\$PATH"
export R_LIBS=$RDEVEL/library
R "$@"

You need to save the script in an executable file somewhere in your $PATH, e.g. ~/bin might be a good idea.

Here are commands that make this script automatically in ~/bin/Rdev:

cat <<EOF>~/bin/Rdev;
#!/bin/bash

export R_LIBS=$RDEVEL/library
export PATH="$RDEVEL/bin/:\$PATH"
R "\$@"
EOF
chmod a+x ~/bin/Rdev

One last thing remaining is to populate the library with packages necessary for the R-devel to run and check the packages, in my case c("knitr", "devtools", "ellipse", "Rcpp", "extrafont", "RColorBrewer", "beeswarm", "testthat", "XML", "rmarkdown", "roxygen2" ) and others (I keep expanding this list while checking my packages). Also, bioconductor packages limma and org.Hs.eg.db, which I need for a package which I build.

Now I can check my packages with Rdev CMD build xyz / Rdev CMD check xyz_xyz.tar.gz