# Fold-change bar plots with “0” on y axis

I see it more and more frequently: bar plots which are supposed to illustrate the regulation of a gene in terms of “fold change”, which include a “0” on the y axis.

It is subtle, but it irks me a lot. Also, the last time I tried to argue with my experimentally working colleagues, I heard that “everybody does it like this” and that I am nit-picking.

What is the fold change? Suppose that you have a before and after measurements, $a_0$ and $a_1$. Now, the fold change is $F=\frac{a_1}{a_0}$

Could you replace $a_0$ by $a_1$ and vice versa? Yes, you could define it as $\frac{a_0}{a_1}$, right? Fold change decrease (how many times smaller) rather than fold change increase (how many times larger).

OK, so what does that mean if the fold change is equal to 0?

First, think what it means that the fold change is equal to 0.5. That means that $a_1$ is half of $a_0$, or that $a_0$ is two times that of $a_1$.

What about 0.1? That means that $a_1$ is ten times smaller than $a_0$.

0.01? Hundred times.

0.001? Thousand times.

You see where this is going. As we approach zero, the relation $\frac{a_0}{a_1}$ approaches infinity; you could say (incorrectly) that when fold change is equal to zero, $a_1$ is infinitely smaller than $a_0$.

Of course, this is outside of regular statistics. In other words, a fold change of 0 is meaningless and cannot be computed. If you measured $a_1$ and it was zero, you cannot meaningfully compute the fold change. Putting a zero on the y axis is therefore as meaningfull as putting “infinity”.

For that and other reasons, in many applications one calculates the log-fold change rather than fold change: $log_2{FC} = \log_2\frac{a_1}{a_0} = \log_2{a_1} - \log_2{a_0}$

That makes the measure nice and symmetric around 0. If $a_1$ is twice higher than $a_0$, then $log_2{FC}=1$. If it is half of $a_0$, then $log_2{FC}=-1$. Also, it follows that $a_0$ and $a_1$ cannot be equal to 0 — because you cannot logarithmize zero.

Moreover, in most applications, logFC is (more or less) normally distributed. Fold change not only isn’t, it is not even possible for it to be. That means that not only putting a zero on the y axis is meaningless; but calculating parametric statistics such as mean and standard deviation of fold change is equally misleading. You simply shouldn’t do that.

But people nonetheless do, and they are happy with that. That is why we cannot have nice things.

# Two bar plots

What is the difference between the two bar plots below? I am sitting on a conference and these type of plots are relatively frequent in the presentations. Complete with a log-scale.

The answer is, of course, that there is no difference between these two — the data is exactly the same, the only thing different is the vertical scale. These two plots explain why you should never, ever use a bar plot to represent log-scaled data: the position of the y axis is completely arbitrary, yet it influences greatly our perception of which plot shows a larger difference.