When building models for your machine linear data, you often want to compare multiple algorithms. The standard way to do this is to use cross validation, which will split your data into multiple training and test sets, score across each, and give you the best results overall. In this article, we will see how to one way to use cross validation in Sklearn.
In this article, we will manually do cross validation by splitting our data twice, running our algorithms on each, and compare the results. Below is an example of testing Logistic Regression and SVM on the iris data set. We train both twice, score them, then take the best of all the results.
from sklearn import datasets ## Load in Validation split and score from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score ## Load Model to compare from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC ## Load iris data set and only use first two features iris = datasets.load_iris() X = iris.data[:, :2] y = iris.target ## Create a first split of our data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25) # Create a second split X_train_2, X_test_2, y_train_2, y_test_2 = train_test_split(X_train, y_train, test_size=0.25) ## Train both models on the second plit svm = SVC(kernel = 'linear') svmMod = svm.fit(X_train_2, y_train_2) lr = LogisticRegression() lrMod = lr.fit(X_train_2, y_train_2) ## Run prediction on the test sets smvPred = svmMod.predict(X_test_2) lrPred = lrMod.predict(X_test_2) ## Print the scores with our second set print(accuracy_score(y_test_2, smvPred)) print(accuracy_score(y_test_2, lrPred)) ## Run prediction on the first test sets smvPred = svmMod.predict(X_test) lrPred = lrMod.predict(X_test) ## Print the scores with our first set print(accuracy_score(y_test, smvPred)) print(accuracy_score(y_test, lrPred))