Fold-change bar plots with “0” on y axis

I see it more and more frequently: bar plots which are supposed to illustrate the regulation of a gene in terms of “fold change”, which include a “0” on the y axis.

It is subtle, but it irks me a lot. Also, the last time I tried to argue with my experimentally working colleagues, I heard that “everybody does it like this” and that I am nit-picking.

What is the fold change? Suppose that you have a before and after measurements, a_0 and a_1. Now, the fold change is


Could you replace a_0 by a_1 and vice versa? Yes, you could define it as \frac{a_0}{a_1}, right? Fold change decrease (how many times smaller) rather than fold change increase (how many times larger).

OK, so what does that mean if the fold change is equal to 0?

First, think what it means that the fold change is equal to 0.5. That means that a_1 is half of a_0, or that a_0 is two times that of a_1.

What about 0.1? That means that a_1 is ten times smaller than a_0.

0.01? Hundred times.

0.001? Thousand times.

You see where this is going. As we approach zero, the relation \frac{a_0}{a_1} approaches infinity; you could say (incorrectly) that when fold change is equal to zero, a_1 is infinitely smaller than a_0.

Of course, this is outside of regular statistics. In other words, a fold change of 0 is meaningless and cannot be computed. If you measured a_1 and it was zero, you cannot meaningfully compute the fold change. Putting a zero on the y axis is therefore as meaningfull as putting “infinity”.

For that and other reasons, in many applications one calculates the log-fold change rather than fold change:

log_2{FC} = \log_2\frac{a_1}{a_0} = \log_2{a_1} - \log_2{a_0}

That makes the measure nice and symmetric around 0. If a_1 is twice higher than a_0, then log_2{FC}=1. If it is half of a_0, then log_2{FC}=-1. Also, it follows that a_0 and a_1 cannot be equal to 0 — because you cannot logarithmize zero.

Moreover, in most applications, logFC is (more or less) normally distributed. Fold change not only isn’t, it is not even possible for it to be. That means that not only putting a zero on the y axis is meaningless; but calculating parametric statistics such as mean and standard deviation of fold change is equally misleading. You simply shouldn’t do that.

But people nonetheless do, and they are happy with that. That is why we cannot have nice things.

Two bar plots

What is the difference between the two bar plots below?


I am sitting on a conference and these type of plots are relatively frequent in the presentations. Complete with a log-scale.

The answer is, of course, that there is no difference between these two — the data is exactly the same, the only thing different is the vertical scale. These two plots explain why you should never, ever use a bar plot to represent log-scaled data: the position of the y axis is completely arbitrary, yet it influences greatly our perception of which plot shows a larger difference.

(See also “Kick the bar chart habit”)