DBSCAN is an algorithm that uses density of observations to build clusters. It improves upon the generic K-means by not requiring the k to be specified, and also offers an alternative algorithm to Meanshift. In this article, we will learn how to cluster by density using DBSCAN in Sklearn.
To use the DBSCAN, we use the DBSCAN class from the cluster module. We create an instance of this class, then pass our data to the fit method as usually. For clustering, we also use the StandardScaler to ensure our features are standardized.
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN
iris = datasets.load_iris()
features = iris.data
scaler = StandardScaler()
features_std = scaler.fit_transform(features)
dbscan = DBSCAN()
model = dbscan.fit(features_std)