DBSCAN is an algorithm that uses density of observations to build clusters. It improves upon the generic K-means by not requiring the k to be specified, and also offers an alternative algorithm to Meanshift. In this article, we will learn how to cluster by density using DBSCAN in Sklearn.
To use the DBSCAN, we use the DBSCAN
class from the cluster
module. We create an instance of this class, then pass our data to the fit
method as usually. For clustering, we also use the StandardScaler
to ensure our features are standardized.
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN
iris = datasets.load_iris()
features = iris.data
scaler = StandardScaler()
features_std = scaler.fit_transform(features)
dbscan = DBSCAN()
model = dbscan.fit(features_std)