Often when building models, we will have a large amount of data given to us. When training models, there are different solvers we can choose from. These solvers use different techniques for solving mathematically optimization to help solve large data sets. In this article, we will see how to choose a solver for a Logistic Regression model.
To specify a different solver for our model, we can use the
solver parameter. We use the
sag model, which is good at handling large data sets.
from sklearn.linear_model import LogisticRegression from sklearn import datasets iris = datasets.load_iris() features = iris.data target = iris.target logistic_regression = LogisticRegression( solver="sag") model = logistic_regression.fit(features_standardized, target) print(model.score())
Sklearn offers multiple solvers for different data sets. For Logistic Regression the offer ‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’. Here is a summary of when to use these solvers from the documentation.
For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.
For multi-class problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to one-versus-rest schemes.
‘newton-cg’, ‘lbfgs’, ‘sag’ and ‘saga’ handle L2 or no penalty
‘liblinear’ and ‘saga’ also handle L1 penalty
‘saga’ also supports ‘elasticnet’ penalty
To learn more about solvers, you can look up Linear Programming, NonLinear Programming, Convex Optimization and Mathematical Optimization to get started. This field is rich and dense.