PCA is a common preprocessing technique used with many machine learning algorithms. PCA will reduce and combine many of your predictors (x variables) into groups (linear combinations). This makes it harder for the final model to be read, but has the benefit of reducing the number of variables needed to be fit. In this article, we will see how to use PCA in Sklearn.
To use PCA, we create a PCA instance using the class from the decomposition module. Then, we use the fit_transform
method and pass in our X matrix. This returns a new matrix with a linear combination (groups) of our variables.
from sklearn import datasets
from sklearn import decomposition
iris = datasets.load_iris()
X = iris.data
y = iris.target
pca = decomposition.PCA()
xPca = pca.fit_transform(X)
print(xPca)