How to use GridCV to Find the K Nearest Neighbors

03.07.2021

Intro

When bulding models, you often need to train many different models and compare different parameters. For K nearest neighbors, the paremter to find is K. For example, we want to rebuild KNN models with multiple Ks to pick the best model. In this article, we will learn how to use GridCV to find K for the K Nearest Neighbors model.

Finding K for KNN

To use gridcv for KNN, we need a few things. First, we build a standardizer using the StandardScaler class. This will be used to normalize our features before training a model.

Next, we build a basic KNN model with KNeighborsClassifier. We then move on to creating a Pipeline which will run the standardizer above and the KNN model. This allows us to reuse the flow of standardized features then build model on those features multiple times.

Finally, we build a list of different K's to try with out model and pass them to a GridSearchCV object. We also specify the cv paramter to tell the search how many times to split samples for each K. There are more details on this in a CV article which will be covered on our blog.

You will be left with the best fit model, which can be used like any other.

from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV

iris = datasets.load_iris()
features = iris.data
target = iris.target

standardizer = StandardScaler()

knn = KNeighborsClassifier(n_neighbors=5)

pipe = Pipeline([("standardizer", standardizer), ("knn", knn)])

search_space = [{"knn__n_neighbors": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}]

grid_search = GridSearchCV(
    pipe,
	search_space
	cv=5, verbose=0)
model = grid_search.fit(features_standardized, target)

print(grid_search.score())