In this, we will do a very short tour of PCA, mainly its computation. The scikit guide has some advanced details on the actual usage of PCA.

Basic idea

The goal of principal component analysis is to identify an orthogonal basis along which data points are spread out the most. For example, imagine a rugby ball mid-air. It’s tilted a little along some axis. Given a bag of 3-dimensional points (aka point-cloud) that comprise the shape of the rugby ball at this very mid-air position and orientation, PCA will identify the three orthogonal vectors that best represent the spread of the points. If we’re being formal, we would say that it will output the basis vectors (the “axes”) along which the variance of points is maximum. The goal of principal component analysis is to identify an orthogonal basis along which data points are spread out the most. For example, imagine a rugby ball mid-air. It’s tilted a little along some axis. Given a bag of 3-dimensional points (aka point-cloud) that comprise the shape of the rugby ball at this very mid-air position and orientation, PCA will identify the three orthogonal vectors that best represent the spread of the points. If we’re being formal, we would say that it will output the basis vectors (the “axes”) along which the variance of points is maximum. The goal of principal component analysis is to identify an orthogonal basis along which data points are spread out the most. For example, imagine a rugby ball mid-air. It’s tilted a little along some axis. Given a bag of 3-dimensional points (aka point-cloud) that comprise the shape of the rugby ball at this very mid-air position and orientation, PCA will identify the three orthogonal vectors that best represent the spread of the points. If we’re being formal, we would say that it will output the basis vectors (the “axes”) along which the variance of points is maximum.

Computing the principal axes using eigendecomposition

For generality, instead of 3D points, we will let the number of dimensions be $m$.

Given the set of $n$ points $P = {\mathbf{P_1}, \mathbf{P_2}, \cdots, \mathbf{P_n}}$, the “center” point is given by the component-wise mean of the points $\mathbf{C} = ∑_i\mathbf{P_i}$. We will _subtract $\mathbf{C}$ from each $\mathbf{P_i}$ to “center the points at the origin” and arrange them all in an $m × n$ matrix $X$, which we will call the data matrix. 1

We compute the covariance matrix $\text{cov}(X) = XX^⊺$. The covariance matrix is a real symmetric matrix of shape $m × m$ and therefore will have $m$ real eigenvalues and $m$ orthonormal eigenvectors. In fact, these eigenvectors scaled by their corresponding eigenvalues are the principal axes.

In short, an eigendecomposition will give us two matrices $R$ (orthogonal) and $S$ (diagonal) such that

$$ \text{cov}(X) = XX^T = R S R^⊺ $$

and the principal axes are the columns of $R$, scaled by the elements of the diagonal matrix $S$2. scikit has a PCA transformer that can do this for us. However, it turns out, the popular approach to computing the principal axes is not an eigendecomposition of the covariance matrix but a singular-value decomposition of the data matrix itself.

Computation of the principal axes using SVD

The output of applying SVD on a matrix is something that is very similar to the output of eigendecomposition. But SVD has the advantage of being applicable to any rectangular matrix, not just square matrices. Given a matrix $X$, SVD amounts to finding the eigendecomposition of $X^⊺X$ and $XX^⊺$ - which are both symmetric matrices - meaning they both have full orthonormal basis composed of eigenvectors. So any matrix $X$ can be decomposed like so

$$ X = V S U^⊺ $$

where

  1. $V$ is an orthogonal matrix where each column vector is a unit eigenvector of $XX^⊺$
  2. $U$ is an orthogonal matrix where each column vector is a unit eigenvector of $X^⊺X$
  3. $S$ is a diagonal matrix composed of $\sqrt{\lambda_i}$ such that $\lambda_i$ is the $i$-th eigenvalue of $X^⊺X$.

We see from points 2 and 3 that the SVD computes the eigenvectors and (square root of) eigenvalues of $X^⊺X$. But we need the eigenvectors and eigenvalues of $\text{cov}(X) = XX^⊺$. Simple enough, just take the SVD of $Y = X^⊺$!

SVD is quite a fascinating operation and I recommend a more detailed study to anyone interested. Good references are Gilbert Strang’s Linear Algebra For Everyone 3 and a paper by Jonathon Shlens illustrating the relationship between PCA and SVD 4.

Example

For an example, I generated a point-cloud of a 3D ellipsoid which is rotated around some arbitrary axis and angle then applied scikit’s PCA on it with n_components=2. So it should project each point onto two of the three principal axes and we can plot a projected 2D ellipse from the points returned by fit(). We also plot the principal axes of the ellipisoid to show the extents along the principal axes.

Using PCA

PCA is rather versatile. Once you’ve found that orthogonal basis, you can use it to construct a bounding shape for the point cloud - bounding box, bounding ellipse, bouding sphere. Eric Lengyel’s game-engine book has detailed maths for each of these constructions.

For data analysis, which is what scikit’s implementation is geared towards - you can use PCA to project higher dimension data points (say a dimension of 500) on the 2 axes along which the points are most spread and plot them in a 2D graph. So as an exploratory tool, it’s valuable.


  1. This “centering of the points” is needed because.. well it’s correct - only then would $XX^⊺$ evaluate to the covariance matrix. The covariance matrix can also be written as a sum $\text{cov}(X) = \frac{1}{n}\sum_{i=1}^n (\mathbf{P_i} - \mathbf{C})(\mathbf{P_i} - \mathbf{C})^⊺$ ↩︎

  2. I like to use the variable names $R$ and $S$ because a symmetric matrix encodes a scaling done relative to a rotated basis. The eigendecomposition basically reveals the rotated basis and the scaling done. $R^T$ therefore is the change of basis matrix from original to the rotated basis and $S$ is the scaling being applied wrt this basis. Then $R$ maps back to the original basis. ↩︎

  3. https://math.mit.edu/~gs/everyone/ ↩︎

  4. https://arxiv.org/pdf/1404.1100.pdf ↩︎