Sometimes data has highley correlated variables or lots of variation. These are huge issues for simple linear regression and can lead to poor performance. To address these issues, we can use Regularization models which add a shrinkage penalty to our Linear Regression Model. In this article, we will learn how to use Regularization models with Sklearn.
To build a Regularization, we can use the Ridge model (there are more like Lasso and Enet). Ridge is a common model to handle correlated variables and variability. To use this model, we create an instance and pass our data to the fit
model the same as we do with other models. Note here, we also apply standardization preprocessing, which helps with many models and is pretty much required for Ridge models.
We also specify an alpha
value which tells Sklearn "how much" penalty to apply. In practice, we will use multiple alpha values and select the model with the best performance.
from sklearn.linear_model import Ridge
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
boston = load_boston()
features = boston.data
target = boston.target
scaler = StandardScaler()
scaledFeats = scaler.fit_transform(features)
## Build the model
regression = Ridge(alpha = 0.5)
model = regression.fit(scaledFeats, target)
print(model.score())