DBSCAN is an algorithm that uses density of observations to build clusters. It improves upon the generic K-means by not requiring the k to be specified, and also offers an alternative algorithm to Meanshift. In this article, we will learn how to cluster by density using DBSCAN in Sklearn.
To use the DBSCAN, we use the
DBSCAN class from the
cluster module. We create an instance of this class, then pass our data to the
fit method as usually. For clustering, we also use the
StandardScaler to ensure our features are standardized.
from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.cluster import DBSCAN iris = datasets.load_iris() features = iris.data scaler = StandardScaler() features_std = scaler.fit_transform(features) dbscan = DBSCAN() model = dbscan.fit(features_std)