This is a little package that I have been using for a long time to visually explore results of PCA on grouped data. The main purpose was to have one simple command that would visualise a result of a PCA in R in 3D and color the data points by group and type.
For example, take the data set provided with the package, called “metabo” (it stems from my paper on metabolic profiling in tuberculosis):
library( pca3d )
data( metabo )
# top left bit of the metabo data frame
head( metabo )[,1:10]
This last command shows the following output:
group X1 X2 X3 X4 X5 X6 X7 X8 X9
1 POS 0.78 1.10 1.26 0.87 0.68 0.65 0.72 0.77 0.88
2 POS 0.68 0.51 0.30 0.21 1.64 2.42 1.19 1.19 1.58
3 POS 1.00 1.31 1.68 1.08 2.46 1.19 1.02 1.82 1.60
4 POS 1.08 0.75 0.65 2.33 0.81 0.72 0.94 0.93 0.31
5 TB 0.87 0.81 0.99 0.85 0.92 0.69 1.12 1.50 0.70
6 TB 1.29 0.89 0.46 0.49 0.50 1.03 1.10 0.48 0.31
Each row corresponds to one serum sample either from TB patients or healthy controls. The first column of the data frame metabo are the group assignments; the remaining 423 columns correspond to relative levels of different small molecules (like sugars or amino acids) in the given serum sample. Running a PCA is straightforward:
pca <- prcomp( metabo[,-1], scale.= TRUE )
And visualisation with pca3d is straightforward as well:
pca3d( pca, group= metabo[,1] )
A 3D output (using the rgl package) is produced — you can interactively turn, zoom and change the perspective of the plot. Also, with the
rgl.snapshot( filename ) command you can export the graphics as a PNG file.
Visualisation of the metabo PCA using pca3d.
You can very clearly see that the blue balls stand apart from the rest in the first two components. What are they? It is not easy to create a reasonable legend directly on an RGL canvas, but pca3d produces a text-only legend in the main text interface:
group: color, shape
NEG: red, tetrahaedron
POS: green3, cube
TB: blue, sphere
Oh, so the TB patients are really different from the rest! Neat. The really elegant thing about the PCA is that it does not use any information about the group classification. Therefore, whatever groups we see, they are real — the visualisation corresponds to an independent validation on the whole data set. This is very much unlike PLS, where the score plots always show a clear separation; PLS is eager to please as one author put it.
Unfortunately, 3D can only be saved as a PNG. However, for a publication, a 2D PDF might be more suitable. Another command in this package,
pca2d takes exactly the same options as the
pca3d command and produces a graphics on the standard R device:
2D -version of the previous plot.
There are plenty of other options to pca3d, for example
show.labels can take a character vector as an argument and show a little text floating above every data point.
Furthermore, it is possible to create biplots. Unlike the normal
biplot function, by default only a few variables are selected from each component (by their absolute loadings in that component) — if there are too many variables visualised, the figure is cluttered and useless.
The red arrows show selected variables
In the above figure, several variables with high loadings can be seen.
Another plot, in which the cluster centroids are shown for all three groups of samples:
The large symbols indicate cluster centroids. Each sample is connected to the corresponding centroid.
pca3d on CRAN: http://cran.r-project.org/web/packages/pca3d/