How to Impute Data with Sklearn

2021-02-09

Intro

When loading data, you often has missing pieces. There are various ways to handle these missing data. A common way is to impute the data or fill in the information. In this article, we will see how to impute data with sklearn.

Imputing Data

To imput data, we use the preprocessing.Imputer() class. Once we have an instance of this class we can all the fit_transform method on data with missing values and sklearn will return data filled in.

import numpy as np
from sklearn import datasets
from sklearn import preprocessing

## Load the data
iris = datasets.load_iris()
X = iris.data

## Mark some as empty
X[1:25] = np.nan

## Impute the missing data
impute = preprocessing.Imputer()
xImputed = impute.fit_transform(X)
print(xImputed)
GoTea - KoalaTea