Ordinal Encoding in Python

07.08.2021

Intro

Ordinal Encoding is similar to Label Encoding where we take a list of categories and convert them into integers. However, unlike Label Encoding, we preserve and order. For example, if we are encoding rankings of 1st place, 2nd place, etc, there is an inherit order. In this article, we will learn how to use Ordinal Encoding in Python.

The Data

Let's create a small data frame with shirts and their costs. The data is here is fake, but the process will work on any data frame. We have a list of cities, which is a categorical variable, that we want to encode.

import pandas as pd

df = pd.DataFrame({
    "shirts": ['small', 'medium', 'large', 'small'],
    "costs": [10, 20, 30 , 40]
})

df.head()
shirts costs
0 small 10
1 medium 20
2 large 30
3 small 40

Using a Label Encoder in Python

To encode our cities, turn them into numbers, we will use the OrdinalEncoder class from the category_encoders package. We first create an instance of the class. We need to pass the cols it will encode cols = ['shirts'] and we can also pass a mapping which will tell the encoder the order of our categories. The mapping is optional, but allows us to control the order. Then we use the fit_transform method to encode our variables.

import category_encoders

mapping = [
    {
        'col': 'shirts',
        'mapping':{
            'small': 0,
            'medium': 1,
            'large': 2,
        }
    }
]

encoder = category_encoders.OrdinalEncoder(
    cols = ['shirts'],
    return_df = True,
    mapping = mapping
)

encoder.fit_transform(df['shirts'])
c:\users\krh12\appdata\local\programs\python\python38\lib\site-packages\category_encoders\utils.py:21: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
  elif pd.api.types.is_categorical(cols):
shirts
0 0
1 1
2 2
3 0

We if we separate the fit and transform steps, we can teach our encoder to reverse the encoding. We can do the following:

  • Create a OrdinalEncoder
  • Fit the categories
  • Use transform to encode
  • Use inverse_transform to decode
encoder = category_encoders.OrdinalEncoder()

encoder = encoder.fit(df)

encoded = encoder.transform(df)

encoder.inverse_transform(encoded)
shirts costs
0 small 10
1 medium 20
2 large 30
3 small 40