rmarkdown and terminal colors

Posted on July 20, 2020 by January

The R output in rmarkdown strips all terminal control sequences – colors and formats (i.e., bold or italics). However, it is relatively easy to restore it. For this, one needs to install the fansi package and include the following chunk in the rmarkdown document, hooking a custom function to the output:

    ```{r echo=FALSE}
    options(crayon.enabled = TRUE)
    knitr::knit_hooks$set(output = function(x, options){
      paste0(
        '<pre class="r-output"><code>',
        fansi::sgr_to_html(x = htmltools::htmlEscape(x), warn = FALSE),
        '</code></pre>'
      )
    })
    ```

If you are using the crayon package, however, you might run into the following problem: in some situations crayon “thinks” that the terminal has a limited capability of displaying colors, and will use only the 16 base colors. Even if more colors are available. One such situation is when one includes colored output in a vignette processed automatically – there is no way to convince the num_colors function from crayon that it should report 256 colors.

Therefore, we need to substitute the num_colors function by a dumber version:

num_colors <- function(forget=TRUE) 256
library(crayon)
assignInNamespace("num_colors", num_colors, pos="package:crayon")

Colorful tables in a terminal

Posted on July 20, 2020 by January

It all started when I wanted to have significant p-values shown on the terminal colored in red. The R terminal is capable of showing colors, simple formatting (like italics or bold) and Unicode characters, thanks to the actual terminal that does the job of displaying R output – whether it is the console of rstudio or a terminal window. You can see that when you use tibbles from tidyverse: they use some very limited formatting (like showing “NA” in red).

I ended up writing a new package, colorDF. The package defines a new class of data frames, but it really does not change their behavior – just the way they are shown (specifically, it modifies some attributes and introduces a print.colorDF function for printing). If you change a tibble to a colorDF, it will still behave exactly like a tibble, but it will be shown in color:

# Color data frame 6 x 87:
# (Showing rows 1 - 20 out of 87)
  │name                 │height│mass │birth_year│gender       │probability
 1│       Luke Skywalker│   172│   77│        19│male         │0.0083     
 2│                C-3PO│   167│   75│       112│NA           │0.0680     
 3│                R2-D2│    96│   32│        33│NA           │0.0596     
 4│          Darth Vader│   202│  136│        42│male         │0.0182     
 5│          Leia Organa│   150│   49│        19│female       │0.0138     
 6│            Owen Lars│   178│  120│        52│male         │0.0115     
 7│   Beru Whitesun lars│   165│   75│        47│female       │0.0489     
 8│                R5-D4│    97│   32│        NA│NA           │0.0040     
 9│    Biggs Darklighter│   183│   84│        24│male         │0.0954     
10│       Obi-Wan Kenobi│   182│   77│        57│male         │0.0242     
11│     Anakin Skywalker│   188│   84│        42│male         │0.0066     
12│       Wilhuff Tarkin│   180│   NA│        64│male         │0.0605     
13│            Chewbacca│   228│  112│       200│male         │0.0587     
14│             Han Solo│   180│   80│        29│male         │0.0519     
15│               Greedo│   173│   74│        44│male         │0.0204     
16│Jabba Desilijic Tiure│   175│ 1358│       600│hermaphrodite│0.0929     
17│       Wedge Antilles│   170│   77│        21│male         │0.0457     
18│     Jek Tono Porkins│   180│  110│        NA│male         │0.0331     
19│                 Yoda│    66│   17│       896│male         │0.0931     
20│            Palpatine│   170│   75│        82│male         │0.0012

Yes, it looks like that in the terminal window!

You can read all about it in the package vignette (please use the package from github, the CRAN version is lagging behind). Apart from the print function, I implemented also a summary function which is more informative than the default summary function for the data frames.

starwars %>% as.colorDF %>% summary

# Color data frame 5 x 13:
  │Col       │Class│NAs  │unique│Summary                                                         
 1│name      │<chr>│    0│    87│All values unique                                               
 2│height    │<int>│    6│    45│ 66 [167 <180> 191] 264                                         
 3│mass      │<dbl>│   28│    38│  15.0 [  55.6 <  79.0>   84.5] 1358.0                          
 4│hair_color│<chr>│    5│    12│none: 37, brown: 18, black: 13, white: 4, blond: 3, auburn: 1, …
 5│skin_color│<chr>│    0│    31│fair: 17, light: 11, dark: 6, green: 6, grey: 6, pale: 5, brown…
 6│eye_color │<chr>│    0│    15│brown: 21, blue: 19, yellow: 11, black: 10, orange: 8, red: 5, …
 7│birth_year│<dbl>│   44│    36│  8 [ 35 < 52>  72] 896                                         
 8│gender    │<chr>│    3│     4│male: 62, female: 19, none: 2, hermaphrodite: 1                 
 9│homeworld │<chr>│   10│    48│Naboo: 11, Tatooine: 10, Alderaan: 3, Coruscant: 3, Kamino: 3, …
10│species   │<chr>│    5│    37│Human: 35, Droid: 5, Gungan: 3, Kaminoan: 2, Mirialan: 2, Twi'l…
11│films     │<lst>│    0│    24│Attack of the Clones: 40, Revenge of the Sith: 34, The Phantom …
12│vehicles  │<lst>│    0│    11│Imperial Speeder Bike: 2, Snowspeeder: 2, Tribubble bongo: 2, A…
13│starships │<lst>│    0│    17│Millennium Falcon: 4, X-wing: 4, Imperial shuttle: 3, Naboo fig…

For numeric vectors, by default the function shows the minimum, quartiles and median, but it can also produce a boxplot-like graphical summary. Since the function works also on lists, implementing a text terminal based boxplot function was super easy:

term_boxplot(Sepal.Length ~ Species, data=iris, width=90)

# Color data frame 5 x 4:
 │Col       │Class│NAs  │unique│Summary                                               
1│setosa    │<dbl>│    0│    15│╾──────┤  +  ├────────╼                               
2│versicolor│<dbl>│    0│    21│         ╾─────────┤    +    ├──────────╼             
3│virginica │<dbl>│    0│    21│         ╾──────────────────┤   +     ├──────────────╼
4│Range     │<chr>│    0│     1│Only one value: Range: 4.3 - 7.9

Cool, isn’t it?

R, rmarkdown, cache and objects

Posted on October 31, 2019 by January

If your rmarkdown takes hours to generate, and you want to be able to generate different document output types on the fly, using the output_format option from rmarkdown::render is extremely annoying: every time you change the output format, the cache is reset, so you need to wait hours to get the other format.

I found no clean solution to this problem, but here is an ugly hack. We create a copy of the document and render it. First time it will take hours, but then the cache will be separate from your original document:

file.copy("test.rmd", "test_html.rmd", overwrite=TRUE)
rmarkdown::render("test_html.rmd")

Of course, this is annoying, and we can wrap this two commands with a function. But beware! Markdown by default evaluates in its parent environment, so to make sure it is evaluated in the global environment, you need to set an option. Here is a wrapper function which also opens by default the resulting document in google-chrome:

myrender <- function(fn, open=TRUE) {

  fb <- gsub("\\.rmd$", "", fn, ignore.case=TRUE)
  fn2 <- paste0(fb, "_html.rmd")
  file.copy(fn, fn2, overwrite=T)
  res <- render(fn2, output_format="html_document", envir=globalenv())
  system(sprintf("google-chrome %s", res))
  res
}

Accessing object from another package by variable

Posted on September 20, 2019 by January

Say the package name is stored in variable x and the name of the object you would like to access from that package (without loading the package) is stored in y. Of course the :: will not work, but fortunately it is just a wrapper around the function getExportedValue, so the following will work:

getExportedValue(x, y)

rmd syntax highlighting in vim

Posted on September 19, 2019 by January

Quick and dirty Rmarkdown syntax highlighting for vim, when you don’t have time install bundles, plugins or dependencies: go to this github repository, download the file rmd.vim from the syntax/ directory and copy it to ~/.vim/syntax/. This should work in 99,99% of the cases.

Add a file to scan when completing in vim

Posted on May 29, 2018 by January

OK, so this was an easy thing to find out, but enormously useful.

In vim, Ctrl-P and Ctrl-N allow to complete a word in insert mode. By default (see the complete option) vim scans the current buffer, buffers in other windows, included files etc. However, to add a specific file (in my case, bibliography which I like to have opened in another terminal) you need to add a k to the complete option, plus the location of the file:

:set complete+=k./bibliography.bib

Invert a list / map

Posted on March 16, 2018 by January

Often we use lists to map keywords onto values, for example

foo <- list(a=c("quark", "fark"), 
            b=c("quark", "foo", "bark"), 
            c=c("fark", "bark"))

To invert this list (such that “fark”, “bark” etc. become keywords, and “a”, “b” and “c” the values), do

foo.rev <- split(rep(names(foo), lengths(foo)), unlist(foo))

split splits a vector or data frame along a factor. In this case, we expand the names of foo using rep such that we get two vectors, as can be seen with the following command:

cbind(rep(names(foo), lengths(foo)), unlist(foo))

with the result

   [,1] [,2]   
a1 "a"  "quark"
a2 "a"  "fark" 
b1 "b"  "quark"
b2 "b"  "foo"  
b3 "b"  "bar"  
c1 "c"  "fark" 
c2 "c"  "bar"

When we apply split() to the first vector with the second to guide the split, we will get

$bar
[1] "b" "c"

$fark
[1] "a" "c"

$foo
[1] "b"

$quark
[1] "a" "b"

Using caret

Posted on July 11, 2017 by January

A caret call I frequently use. Given that x is training data and y response,

library(doMC)
registerDoMC(cores=6)

tc <- trainControl(method="repeatedcv", number=10, repeats=1, 
  returnData=TRUE, savePredictions="all", verboseIter=TRUE, classProbs=TRUE)
mod <- train(x=x, y=y, trControl=tc, method="rf",
  tuneGrid=data.frame(mtry=500))

library(doMC) and registerDoMC allow me to use more than one processor
repeatedcv: if more than one repeat of k-fold crossvalidation is requested, the repeated= parameter should be modified. repeatedcv must be used instead of cv
savePredictions: if we want to evaluate predictions on our own
verboseIter: to see the progress
classProbs: to report class probabilities, so we can use them to calculate ROC post factum
tuneGrid: if not specified, caret will tune parameters. Normally, we don’t want that

Custom comparison function for sorting data in R

Posted on April 11, 2017 by January

Many languages allow you to use a custom comparison function in sorting. R is not an exception, but it is not entirely straightforward – it requires you to define a new class and overload certain operators. Here is how to do it.

Consider the following example. You have a certain number of paired values, for example

v <- list(a=c(2,1), b=c(1,3), d=c(1,1), e=c(2,3))

The job is to order these pairs in the following way. Given two pairs, p1=(x1, y1) and p2=(x2, y2), p1 < p2 iff one of the following conditions is fulfilled: either x1 < x2 and y1 <= y2, or x1 <= x2 and y1 < y2. The point is that if we draw lines, where one end of the line is at the height x1, and the other end is at the height y1, we want to sort these lines only if they do not cross — at most, only if one of their ends overlaps (but not both, because then the lines would be identical):

On the figure above, left panel, p1 < p2, because one of the ends is below the end of the other line (x1 < x2 and y1=y2). Of course, if y1 < y2 the inequality still holds. On the other hand, the right panel shows a case where we cannot resolve the comparison; the lines cross, so we should treat them as equal.

If now we have a list of such pairs and want to order it, we will have a problem. Here is the thing: the desired order is {d, a, b, e}. The element d=(1,1) is clearly smaller (as defined above) than all the others. However, b=(1,3) is not smaller than a=(2,1), and a is not smaller than b; that means, that a is equal to b, and their order should not be modified.

There is no way to do that with regular tools such as order, especially since x and y may not only be on different scales — they might be even completely different data types! One might be a numeric vector, the other a character string. Or, possibly, a type of requisite from Monty Python (with a defined relation stating that a banana is less than a gun). We must use a custom comparator.

For this, we need to notice that the R functions sort and order rely on the function xtfrm. This in turns relies on the methods ==, > and [, defined for a given class. For numeric vectors, for example, these give what you would expect.

Our v vector is a list with elements which are pairs of numbers. For this type of data, there is no comparison defined; and comparing two pairs of numbers results with a vector of two logical numbers, which is not what we want.

> v[1] < v[2]
Error in v[1] < v[2] : comparison of these types is not implemented
> v[[1]] < v[[2]]
[1] FALSE  TRUE

R, however, is an object oriented language (even if it does not always feel like that). Comparisons (“, ==) are generic functions and it is possible to define (or redefine) them for any class of objects. So here is the plan: we invent a new class for the object v, and define custom comparisons for the elements of this class of objects. Remember that if we define a function which name consists of a generic (like "plot" or "["), a dot, and a name of the class, we are defining the method for the given class:

## make v an object of class "foo"
class(v) <- "foo"

## to use the "extract" ([) method, 
## we need to momentarily change the class of x, because 
## otherwise we will end up in an endless loop
'[.foo' <- function(x, i) {
    class(x) <- "list"
    x <- x[i]
    class(x) <- "foo"
    x
}

## define ">" as stated above
## the weird syntax results from the fact that a and b
## are lists with one element, this element being a vector 
## of a pair of numbers
'>.foo' <- function(a,b) {
a <- a[[1]]
b <- b[[1]]
ifelse( (a[1] > b[1] && a[2] >= b[2])
                     ||
        (a[1] >= b[1] && a[2] > b[2]), TRUE, FALSE)
}

## if we can't find a difference, then there is no difference
'==.foo' <- function(a, b) 
    ifelse(a > b || b > a, FALSE, TRUE)

## we don't need that, but for the sake of completeness...
'<.foo' <- function(a, b) b > a

This will now do exactly what we want:

> v["a"] == v["b"]
[1] TRUE
> v["a"] > v["d"]
[1] TRUE
> sort(v)
$d
[1] 1 1

$a
[1] 2 1

$b
[1] 1 3

$e
[1] 2 3

attr(,"class")
[1] "foo"

R, shiny and source()

Posted on March 22, 2017 by January

This one cost me more time to figure out than it should have. The reason being, it turns out that I never properly understood what the source() function does.

So here is the story: I was setting up a shiny server for a student based on her code. She was running the shiny app from within RDesktop, and so before starting the app with runApp() she would load all necessary object and source() a file called helpers.R with some common calculations.

In order to put the app on a server, I have moved these pre-runApp() initializations into ui.R and server.R. Suddenly, weird errors appeared. The functions in the helpers.R no longer seemed to be able to find anything in the parent environment — object X not found! Even though I called source() immediately after loading the necessary objects into the environment:

# file server.R
load("myobjects.rda")
source("helpers.R")

The solution was, as usual, to read documentation. Specifically, documentation on source():

local   TRUE, FALSE or an environment, determining where the 
        parsed expressions are evaluated. FALSE (the default) 
        corresponds to the user's workspace (the global 
        environment) and TRUE to the environment from which 
        source is called.

The objects which I have load()-ed before were not in the global environment, but instead in another environment created by shiny. However, the expressions from helpers.R were evaluated in the global environment. Thus, a new function defined in helpers.R could be seen from inside server.R, but an object loaded from server.R could not be seen by helpers.R.

It is the first time that I have noticed this. Normally, I would use a file such as helpers.R only to define helper functions, and actually call them from server.R or ui.R. However, I was thinking that source() is something like #include in C, simply calling the commands in the given file as if they were inserted at this position into the code — or called from the environment from which source() was called.

This is not so.

log Fold Change

bits and pieces from my work