How to use K-Means Clustering in Sklearn

03.14.2021

Intro

K-means is a very common clustering method which attempts to group observations into k groups. The k is decided beforehand usually based on domain knowledge or by using selection techniques. In this article, we will learn how to build a K-means clustering algorithm in Sklearn.

Creating Kmeans Clustering Model

To build a k-means clustering algorithm, use the KMeans class from the cluster module. One requirement is that we standardized the data, so we also use StandardScaler to prepare the data. Then we build an instance KMeans and specify n_clusters and we use 3 because we know ahead of time that the iris set has 3 clusters. In the future, we will learn how to find this value when it isn't known.

from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

iris = datasets.load_iris()
features = iris.data

scaler = StandardScaler()
features_std = scaler.fit_transform(features)

cluster = KMeans(n_clusters=3)

model = cluster.fit(features_std)