Ordinal Encoding is similar to Label Encoding where we take a list of categories and convert them into integers. However, unlike Label Encoding, we preserve and order. For example, if we are encoding rankings of 1st place, 2nd place, etc, there is an inherit order. In this article, we will learn how to use Ordinal Encoding in Python.
Let's create a small data frame with shirts and their costs. The data is here is fake, but the process will work on any data frame. We have a list of cities, which is a categorical variable, that we want to encode.
import pandas as pd
df = pd.DataFrame({
"shirts": ['small', 'medium', 'large', 'small'],
"costs": [10, 20, 30 , 40]
})
df.head()
shirts | costs | |
---|---|---|
0 | small | 10 |
1 | medium | 20 |
2 | large | 30 |
3 | small | 40 |
To encode our cities, turn them into numbers, we will use the OrdinalEncoder
class from the category_encoders
package. We first create an instance of the class. We need to pass the cols it will encode cols = ['shirts']
and we can also pass a mapping which will tell the encoder the order of our categories. The mapping is optional, but allows us to control the order. Then we use the fit_transform
method to encode our variables.
import category_encoders
mapping = [
{
'col': 'shirts',
'mapping':{
'small': 0,
'medium': 1,
'large': 2,
}
}
]
encoder = category_encoders.OrdinalEncoder(
cols = ['shirts'],
return_df = True,
mapping = mapping
)
encoder.fit_transform(df['shirts'])
c:\users\krh12\appdata\local\programs\python\python38\lib\site-packages\category_encoders\utils.py:21: FutureWarning: is_categorical is deprecated and will be removed in a future version. Use is_categorical_dtype instead
elif pd.api.types.is_categorical(cols):
shirts | |
---|---|
0 | 0 |
1 | 1 |
2 | 2 |
3 | 0 |
We if we separate the fit and transform steps, we can teach our encoder to reverse the encoding. We can do the following:
encoder = category_encoders.OrdinalEncoder()
encoder = encoder.fit(df)
encoded = encoder.transform(df)
encoder.inverse_transform(encoded)
shirts | costs | |
---|---|---|
0 | small | 10 |
1 | medium | 20 |
2 | large | 30 |
3 | small | 40 |