It is common to have data sets with categorical data. For example, say we have a size column that values are small, medium, and large. Often, we want to transfrom these variables into number to apply mathematic algorithms to them. In this article, we will see how to encode categorical variables in sklearn.
To encode categorical variables, we can use the OneHotEncoder
class and run fit_transform
on the data. In the example below, we transform the iris.target data.
from sklearn import datasets
from sklearn import preprocessing
iris = datasets.load_iris()
X = iris.data
y = iris.target
cat_encoder = preprocessing.OneHotEncoder()
encoded = cat_encoder.fit_transform(y.reshape(-1,1)).toarray()
print(encoded)