Label Encoding in Python

Intro

Label Encoding is one of many encoding techniques to convert your categorical variables into numerical variables. This is a requirement for many machine learning algorithms. Label Encoding is used when you have a number of categories that don't have an order. If your data is orders, like small, medium, large, you should use the Ordinal Encoding. In this article, we will learn how to use label encoding in Python.

The Data

Let's create a small data frame with cities and their population. The data is here is fake, but the process will work on any data frame. We have a list of cities, which is a categorical variable, that we want to encode.

import pandas as pd

df = pd.DataFrame({
    "city": ['Dallas', 'Austin', 'Denver', 'Boulder'],
    "pop": [1000, 2000, 3000 , 4000]
})

df.head()

	city	pop
0	Dallas	1000
1	Austin	2000
2	Denver	3000
3	Boulder	4000

Using a Label Encoder in Python

To encode our cities, turn them into numbers, we will use the LabelEncoder class from the sklearn.preprocessing package. We first create an instance of the class, then we use the fit_transform method to encode our variables.

from sklearn import preprocessing
  
le = preprocessing.LabelEncoder()

le.fit_transform(df['city'])

array([2, 0, 3, 1])

We if we separate the fit and transform steps, we can teach our encoder to reverse the encoding. We can do the following:

Create a LabelEncoder
Fit the categories
Use transform to encode
Use inverse_transform to decode

le = preprocessing.LabelEncoder()

le = le.fit(df['city'])

encoded = le.transform(df['city'])

list(le.inverse_transform(encoded))

['Dallas', 'Austin', 'Denver', 'Boulder']

Label Encoding in Python

07.06.2021

Intro

The Data

Using a Label Encoder in Python

How to Create a Timeseries in Python

Ordinal Encoding in Python