Sometimes data has highley correlated variables or lots of variation. These are huge issues for simple linear regression and can lead to poor performance. To address these issues, we can use Regularization models which add a shrinkage penalty to our Linear Regression Model. In this article, we will learn how to use Regularization models with Sklearn.
To build a Regularization, we can use the Ridge model (there are more like Lasso and Enet). Ridge is a common model to handle correlated variables and variability. To use this model, we create an instance and pass our data to the
fit model the same as we do with other models. Note here, we also apply standardization preprocessing, which helps with many models and is pretty much required for Ridge models.
We also specify an
alpha value which tells Sklearn "how much" penalty to apply. In practice, we will use multiple alpha values and select the model with the best performance.
from sklearn.linear_model import Ridge from sklearn.datasets import load_boston from sklearn.preprocessing import StandardScaler boston = load_boston() features = boston.data target = boston.target scaler = StandardScaler() scaledFeats = scaler.fit_transform(features) ## Build the model regression = Ridge(alpha = 0.5) model = regression.fit(scaledFeats, target) print(model.score())