Dimensionality Reduction — Principal Component Analysis (The Easy Way)

Rahul
2 min readJun 26, 2018

There are as many dimensions as there are variables, so in a data set with 8 variables the sample space is 8 dimensional. That makes your head hurt? It’s tough to visualise.

Dimensionality Reduction attempts to distill higher- dimensional data down to a smaller number of dimensions while preserving as much of the variance in the data as possible.

Remember K Means Clustering? It is an example of dimensionality reduction algorithm. It reduces data down to K dimensions.

Principal Component Analysis:

The main linear technique for dimensionality reduction is principal component analysis, performs a linear mapping of the data to a lower-dimensional space in such a way that the variance of the data in the low-dimensional representation is maximised.

So the covariance (and sometimes the correlation) matrix of the data is constructed and the eigenvectors on this matrix are computed.

The data gets projected onto the hyperplanes, which represents the lower dimensions you want to represent.

The greatest variance of the data set comes to lie on the first axis and the second greatest variance on the second axis, and so on.. This process allows us to reduce the number of variables used in an analysis.

PCA IN ACTION

PCA is really useful for things like image compression and facial recognition, however the most challenging part of the PCA is interpreting the components

Fortunately, Scikit Learn makes it very easy to use PCA and you can do it with just 3 lines of code.

--

--

No responses yet