In a previous article, we learned how to find the most important features of a Random Forest model. In practice it is often useful to simplify a model so that it can be generalized and interpreted. Thus, we may want to fit a model with only the important features. In this article, we will learn how to fit a Random Forest Model using only the important features in Sklearn.
To build a random forest model with only important features, we need to use the
SelectFromModel class from the
feature_selection package. We create an instance of
SelectFromModel using the random forest class (in this example we use a classifer). We also specify a threshold for "how important" we want features to be. All features less than .2 will not be used.
Next, we apply the
fit_transform to our features which will filter out unimportant features. Finally, we fit a random forest model like normal using the important features.
from sklearn.ensemble import RandomForestClassifier from sklearn import datasets from sklearn.feature_selection import SelectFromModel iris = datasets.load_iris() features = iris.data target = iris.target randomforest = RandomForestClassifier() selector = SelectFromModel(randomforest, threshold=0.2) importantFeatures = selector.fit_transform(features, target) model = randomforest.fit(importantFeatures, target)