When working on classification problems, we often have samples with imbalance classes. For example, let's say we want to classify housed into masion or not mansion. There are likely to be more non mansions then mansions in the world, thus our data set might reflect this. In this article, we will learn how to handle imbalanced classes with Random Forest Tree Classifier in Sklearn.
To handle imbalanced classes with a
RandomForestClassifier classifier, we fit the data just as normal. The only difference is we use the
class_weight property and pass the
balanced value. This will will force the classifer to use stratified sampling and other techniques to balance and select the best model.
import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn import datasets iris = datasets.load_iris() features = iris.data target = iris.target # Select only a few obs so the classes are bad features = features[30:,:] target = target[30:] randomforest = RandomForestClassifier(class_weight = "balanced") model = randomforest.fit(features, target) print(model.score())