Regularization with Logistic Regression to Reduce Variance

12.14.2020

Intro

One of the main issues when fitting a machine learning model is overfitting. This comes from training a model that develops parameters that match the model too well and don't generalize. Often, the reason for this is variance in the data. To counter this, we can use regularization techniques (which also help with other issues). Let's see how to regularize a Logistic Regression model using sklearn.

Example

To add regularization to Logistic Regression, we can use the LogisticRegressionCV class. We pass in two parameters the penalty and the Cs. Penalty allows us to specify which type to use from l1, l2 or elasticnet which correspond to Lasso, Ridge and Enet models. The Cs parameter will create a list of 10 costs to train on (the amount of penalty basically.) The class will return the best model from a the selection or parameters.

from sklearn.linear_model import LogisticRegressionCV
from sklearn import datasets

iris = datasets.load_iris()
features = iris.data
target = iris.target


# Create a Logistic model with penalty
logistic_regression = LogisticRegressionCV(
    penalty='l2',
 Cs=10)

# Train model
model = logistic_regression.fit(features_standardized, target)

If you want more theory, we will have separate articles on the details or you can also look up the model Lasso, Ridge and Enet to get started.