When testing algorithm performance, you want to split your data into a training and test set. This allows you test train on a subset of your data and compare the performance on the test set. One common error you can catch is overffiting to see if you trained too percicesly to your training set and the model can't be use on other sets. In this article, we will see how to split your data into training and test sets using Sklearn.
To split our data using sklearn, we use the
train_test_split method from the
model_selection package. This method will split our x and y into training and test. It also does stratified sampling automatically to help with level variation from your categorical variables.
from sklearn.model_selection import train_test_split from sklearn import datasets boston = datasets.load_boston() X = boston.data y = boston.target X_train, X_test, y_train, y_test = train_test_split(X, y)