K-means is a very common clustering method which attempts to group observations into k groups. The k is decided beforehand usually based on domain knowledge or by using selection techniques. In this article, we will learn how to build a K-means clustering algorithm in Sklearn.
To build a k-means clustering algorithm, use the KMeans
class from the cluster
module. One requirement is that we standardized the data, so we also use StandardScaler
to prepare the data. Then we build an instance KMeans
and specify n_clusters
and we use 3 because we know ahead of time that the iris
set has 3 clusters. In the future, we will learn how to find this value when it isn't known.
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
iris = datasets.load_iris()
features = iris.data
scaler = StandardScaler()
features_std = scaler.fit_transform(features)
cluster = KMeans(n_clusters=3)
model = cluster.fit(features_std)