Fold-change bar plots with “0” on y axis

I see it more and more frequently: bar plots which are supposed to illustrate the regulation of a gene in terms of “fold change”, which include a “0” on the y axis.

It is subtle, but it irks me a lot. Also, the last time I tried to argue with my experimentally working colleagues, I heard that “everybody does it like this” and that I am nit-picking.

What is the fold change? Suppose that you have a before and after measurements, a_0 and a_1. Now, the fold change is

F=\frac{a_1}{a_0}

Could you replace a_0 by a_1 and vice versa? Yes, you could define it as \frac{a_0}{a_1}, right? Fold change decrease (how many times smaller) rather than fold change increase (how many times larger).

OK, so what does that mean if the fold change is equal to 0?

First, think what it means that the fold change is equal to 0.5. That means that a_1 is half of a_0, or that a_0 is two times that of a_1.

What about 0.1? That means that a_1 is ten times smaller than a_0.

0.01? Hundred times.

0.001? Thousand times.

You see where this is going. As we approach zero, the relation \frac{a_0}{a_1} approaches infinity; you could say (incorrectly) that when fold change is equal to zero, a_1 is infinitely smaller than a_0.

Of course, this is outside of regular statistics. In other words, a fold change of 0 is meaningless and cannot be computed. If you measured a_1 and it was zero, you cannot meaningfully compute the fold change. Putting a zero on the y axis is therefore as meaningfull as putting “infinity”.

For that and other reasons, in many applications one calculates the log-fold change rather than fold change:

log_2{FC} = \log_2\frac{a_1}{a_0} = \log_2{a_1} - \log_2{a_0}

That makes the measure nice and symmetric around 0. If a_1 is twice higher than a_0, then log_2{FC}=1. If it is half of a_0, then log_2{FC}=-1. Also, it follows that a_0 and a_1 cannot be equal to 0 — because you cannot logarithmize zero.

Moreover, in most applications, logFC is (more or less) normally distributed. Fold change not only isn’t, it is not even possible for it to be. That means that not only putting a zero on the y axis is meaningless; but calculating parametric statistics such as mean and standard deviation of fold change is equally misleading. You simply shouldn’t do that.

But people nonetheless do, and they are happy with that. That is why we cannot have nice things.

Advertisements

Testing variance before ANOVA

400px-GeorgeEPBox

β€œTo make the preliminary test on variances [before running a t-test or ANOVA] is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port!”

  • George Box, Biometrika 1953;40:318–35.