K-means is a very common clustering method which attempts to group observations into k groups. The k is decided beforehand usually based on domain knowledge or by using selection techniques. In this article, we will learn how to build a K-means clustering algorithm in Sklearn.
To build a k-means clustering algorithm, use the
KMeans class from the
cluster module. One requirement is that we standardized the data, so we also use
StandardScaler to prepare the data. Then we build an instance
KMeans and specify
n_clusters and we use 3 because we know ahead of time that the
iris set has 3 clusters. In the future, we will learn how to find this value when it isn't known.
from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans iris = datasets.load_iris() features = iris.data scaler = StandardScaler() features_std = scaler.fit_transform(features) cluster = KMeans(n_clusters=3) model = cluster.fit(features_std)