How to Find the Most Important Features in Random Forests model using Sklearn

02.27.2021

Intro

Once you have built a model, if the model is easily interpretable, it is often interesting to learn which of the features are most important. This helps guides some intuition about what values affect the target or the prediction. For example, if you are looking at churn data, it would be nice to see some features of your churned customers (low usage, number of complaints) to see what is the root cause. In this article, we will learn how to find the most important features in a Random Forest Model.

Viewing the Important Features

To view the most important features in a model, we use the feature_importances_ property. This will return a list of features and their importance score. Depending on the model this can mean a few things. In general, the higher tha value, the more important the feature is.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets

iris = datasets.load_iris()
features = iris.data
target = iris.target

randomforest = RandomForestClassifier()

model = randomforest.fit(features, target)

importances = model.feature_importances_
print(importances)