How to Handle Imbalanced Classes with Random Forest Tree Classifier in Sklearn

03.02.2021

Intro

When working on classification problems, we often have samples with imbalance classes. For example, let's say we want to classify housed into masion or not mansion. There are likely to be more non mansions then mansions in the world, thus our data set might reflect this. In this article, we will learn how to handle imbalanced classes with Random Forest Tree Classifier in Sklearn.

Handling Imbalanced Classes

To handle imbalanced classes with a RandomForestClassifier classifier, we fit the data just as normal. The only difference is we use the class_weight property and pass the balanced value. This will will force the classifer to use stratified sampling and other techniques to balance and select the best model.

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets

iris = datasets.load_iris()
features = iris.data
target = iris.target

# Select only a few obs so the classes are bad
features = features[30:,:]
target = target[30:]

randomforest = RandomForestClassifier(class_weight = "balanced")

model = randomforest.fit(features, target)
print(model.score())