How to use Pipelines in Sklearn

2021-02-10

Intro

Pipelines allow you to easily connect data processing together in Sklearn. For example, you could create a pipeline to run scaling then train a model. Then, whenever you call your pipeline, you don't have to remember to scale the data first. In this article, we will learn how to use pipelines in Sklearn.

Creating a Pipeline

To build a pipeline, we pass a list of tuples (key, the processor) to the Pipeline class. We can then use the fit method on our data similar to how we do with other models.

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn import make_classification
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline


X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

pipe = Pipeline([
	('scaler', StandardScaler()), 
	('svc', SVC())
])

pipe.fit(X_train, y_train)
print(pipe.score(X_test, y_test))
GoTea - KoalaTea