When bulding models, you often need to train many different models and compare different parameters. For K nearest neighbors, the paremter to find is K. For example, we want to rebuild KNN models with multiple Ks to pick the best model. In this article, we will learn how to use GridCV to find K for the K Nearest Neighbors model.
To use gridcv for KNN, we need a few things. First, we build a standardizer using the StandardScaler
class. This will be used to normalize our features before training a model.
Next, we build a basic KNN model with KNeighborsClassifier
. We then move on to creating a Pipeline
which will run the standardizer above and the KNN model. This allows us to reuse the flow of standardized features then build model on those features multiple times.
Finally, we build a list of different K's to try with out model and pass them to a GridSearchCV
object. We also specify the cv
paramter to tell the search how many times to split samples for each K. There are more details on this in a CV article which will be covered on our blog.
You will be left with the best fit model, which can be used like any other.
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
features = iris.data
target = iris.target
standardizer = StandardScaler()
knn = KNeighborsClassifier(n_neighbors=5)
pipe = Pipeline([("standardizer", standardizer), ("knn", knn)])
search_space = [{"knn__n_neighbors": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}]
grid_search = GridSearchCV(
pipe,
search_space
cv=5, verbose=0)
model = grid_search.fit(features_standardized, target)
print(grid_search.score())