New editor in WordPress and how to circumvent it

I have problems like everyone else with the new editor, except that my main issue is editing existing posts.

1. Why I don’t want to use the new editor, in order of importance:

– it doesn’t take up all of the screen estate, only a narrow column in the middle of one of the two of 24″ monitors I use to work with text.
– it meddles with my HTML code. I have many “tl;dr” (very long) posts (not here, on another blog) which I like to format with empty lines in order to facilitate editing.
– it is buggy (read: non-functional) in my browser of choice (experimental Opera running on Ubuntu). Now, I don’t ask WordPress to adapt to my needs; but if everything else works fine with that browser, why would I want to switch to something I don’t like just because one beta feature on one site doesn’t work?

The first two problems make the editor unusable; the third can be circumvented (I could work with WordPress and WordPress only in Firefox, for example).

2. How to circumvent the new editor

Of course, clicking on the little pencil icon on your blog post takes you to the new editor. To use the old editor, the only way is to go to “Dashboard” -> “Posts”, search for the post you would like to edit, and click on “Edit”.

3. A better solution

Another, better solution would be to add a link rewrite plugin. The link to the old style editor looks like this: https://yourblogname.wordpress.com/wp-admin/post.php?post=1234&action=edit

Since the link to the new editor looks like this:

https://wordpress.com/post/BlogIDNumber/1234

I think that a regular expression rewriting your links (for example, a simple bookmarklet) should do the trick. Essentially, you would catch the regular expression “https://wordpress.com/post/BlogIDNumber/(%5B0-9%5D*)” and replace the link by “https://yourblogname.wordpress.com/wp-admin/post.php?post=$1&action=edit”.

The code below is untested and you should use it on your own responsibility

There is actually quite an easy piece of javascript code that you can add as a bookmark. You click the bookmark, and all your new-editor links convert in your old-editor links. You may start with the following code:

javascript:(function(){
var m = /wordpress.com\/post\/YourBlogID\/([0-9]*)/ ;
var r = 'yourblogname.wordpress.com/wp-admin/post.php?post=$1&action=edit';
var links = window.document.getElementsByTagName('a');
for(var i = 0, l; l = links[i]; i++) {
if (l.href)
l.href = l.href.replace(m,r);
}
})();

Replace “yourblogname” by your blog name (e.g. “logfc” in case of my blog); replace “YourBlogID” by your blog ID. Add a bookmark that contains, as reference, the above code. Clicking on the bookmark should replace all “new style links” by “old style links”.

Here is an example screenshot from Google Chrome. You see that in the field where you would normally put your URL (http…), you enter the above code (except that you replace “YourBlogID” by the actual ID and “yourblogname” by the actual name):

logfc

The downside is that you will need a bookmark for each blog that you are writing.

Sloppy Science

Last week, Science has published a paper by Rodriguez and Laio on a density-based clustering algorithm. As a non-expert, I found the results actually quite good compared to the standard tools that I am using in my everyday work. I even implemented the package as an R package (soon to be published on CRAN, look out for “fsf”).

However, there are problems with the paper. More than one.

1. The authors claim that the density for each sample is determined with a simple formula which is actually the number of other samples within a certain diameter. This does not add up, since then the density must be always a whole number. It is obvious from the figures that this is not the case. When you look up the original matlab code in the supplementary material, you see that the authors actually use a Gaussian kernel function for density calculation.

2. If you use the simple density count as described in the paper, the algorithm will not and cannot work. Imagine a relatively simple case with two distinct clusters. Imagine that in one cluster, there is a sample A with density 25, and in the other cluster, there are two samples, B and C, with identical densities 24. This is actually quite likely to happen. The algorithm now determines, for each sample, \delta, that is the distance to the next sample with higher density. The whole idea of the algorithm is that for putative cluster centres, this distance will be very high, because it will point to the center of another cluster.

However, with ties, we have the following problem. If we choose the approach described by the authors, then both of the samples with density B and C (which have identical density 24) will be assigned a large \delta value and will become cluster center candidates. If we choose to use a weak inequality, then B will point to C, and C to B, and both of them will have a small \delta.

Therefore, we either have both B and C as equivalent cluster candidates, or none of them. No wonder that the authors never used this approach!

3. The authors explicitly claim that their algorithm can “automatically find the correct number of clusters.” This does not seem to be true, at least there is nothing in the original paper that warrants this statement. If you study their matlab code, you will find that the selection of cluster centers is done manually by a user selecting a rectangle on the screen. Frankly, I cannot even comment on that, this is outrageous.

I think that Science might have done a great disservice to the authors — everyone will hate them for having a sloppy, half-baked paper that others would get rejected in PLoS ONE published in Science. I know I do :-)

Copy text from images

All the elements were there. Algorithms freely available; implementations ready to download, for free. It just took one clever person to FINALLY make it: a seamless way to copy text from images. At least in Chrome, for now, but since it is Open Source, I guess it is just a matter of time and we will see it as a build-in feature in Firefox and other browsers. For now, use the Project Naphta Chrome browser extension. The main project page is also an interesting read.

Words on the web exist in two forms: there’s the text of articles, emails, tweets, chats and blogs— which can be copied, searched, translated, edited and selected— and then there’s the text which is shackled to images, found in comics, document scans, photographs, posters, charts, diagrams, screenshots and memes. Interaction with this second type of text has always been a second class experience, the only way to search or copy a sentence from an image would be to do as the ancient monks did, manually transcribing regions of interest.

(from the Naphta web site)

It works really nice, assuming that you select the “English tesseract” option from the “Languages” sub-menu — the off-line javascript implementation of the OCRad is not very effective, in contrast to the cloud-based tesseract service.

Rotating movies with RGL

Learned another thing today: it is very simple to create animated GIFs in the rgl package with the build-in functions spin3d, play3d and movie3d.

library( pca3d )
data(metabo)
pca <- prcomp( metabo[,-1], scale.= TRUE )
pca3d( pca, group= metabo[,1] )
rot <- spin3d( axis= c( 0, 1, 0 ) )
movie3d( rot, duration= 12 )

Here is the result:

movie

Check credit card numbers using the Luhn algorithm

You can find a nice infographic explaining how to understand the credit card number here.

Credit card numbers include, as the last digit, a simple checksum calculated using the Luhn algorithm. In essence, we add up the digits of the card number in a certain way, and if the resulting sum divided by 10 is not an integer (i.e., the reminder of dividing by 10 is zero), then the number is invalid (the reverse is not true; a number divisible by 10 can still be invalid).

This can be easily computed in R, although I suspect that there might be an easier way:

checkLuhn <- function( ccnum ) {

  # helper function
  sumdigits <- function( x ) 
    sapply( x, function( xx ) 
          sum( as.numeric( unlist( strsplit( as.character( xx ), "" ))))
          )
  # split the digits
  ccnum <- as.character( ccnum )
  v <- as.numeric( unlist( strsplit( ccnum, "" ) ) )
  v <- rev( v )

  # indices of every second digit
  i2 <- seq( 2, length( v ), by= 2 )
  v[i2] <- v[i2] * 2
  v[ v > 9 ] <- sumdigits( v[ v > 9 ] )

  if( sum( v ) %% 10 == 0 ) return( TRUE )
  else return( FALSE )

}


Sample size / power calculations for Kaplan-Meier survival curves

The problem is simple: we have two groups of animals, treated and controls. Around 20% of the untreated animals will die during the course of the experiment, and we would like to be able to detect effect such that instead of 20%, 80% of animals will die in the treated group, with power 0.8 and alpha=0.05. Group sizes are equal and no other parameters are given.

What is the necessary group size?

I used the `ssizeCT.default` function from the `powerSurvEpi` R package. Based on the explanation in the package manual, this calculates (in my simple case) the required sample size in a group as follows:

n = \frac{m}{p_E + p_C}

where p_E and p_C are, respectively, probabilities of failure in the E(xpermiental) and C(ontrol) groups. I assume that in my case I should use 0.8 and 0.2, respectively, so n=m. The formulas here are simplified in comparison with the manual page of ssizeCT.default, simply because the group sizes are identical.

m is calculated as

m=\big(\frac{RR+1}{RR-1}\big)^2(z_{1-\alpha/2}+z_{1-\beta})^2

RR is the minimal effect size that we would like to be able to observe with power 0.8 and at alpha 0.05. That means, if the real effect size is RR or greater, we have 80% chance of getting a p-value smaller than 0.05 if the group sizes are equal to m. To calculate RR, I first calculate \theta, the hazard ratio, and for this I use the same approximate, expected mortality rates (20% and 80%):

\theta = \log(\frac{\log(0.8)}{\log(0.2)}) = -1.98

Since RR=exp(\theta)=0.139; thus m=18.3. This seems reasonable (based on previous experience).

Copy-paste and R interactive sessions

I wanted to have a way of easily copying R commands from my interactive sessions. I often work interactively, while creating a pipeline or lab-book like readme in a separate file.

Of course, you can just copy anything that is on the screen and paste it. However, I want to have my readme file to be directly executable, so generally I want only to copy the commands. The solution was to write a small function, called xclip, which copies an arbitrary number of lines from history into the X window system primary clipboard. Naturally, it is not compatible with Windows or MacOS. However, I can copy now a fragment of recent history with one command and one mouse click (since the primary X window system clipboard runs without shortcuts).

The function actually is only a wrapper around the little xclip command utility, available for Linux.

xclip <- function (n = 5) 
{
    t <- tempfile()
    savehistory(file = t)
    system(paste0("tail -n ", n, " ", t, "|xclip"))
}