Label Encoding in R

07.02.2021

Intro

Label Encoding is one of many encoding techniques to convert your categorical variables into numerical variables. This is a requirement for many machine learning algorithms. Label Encoding is used when you have a number of categories that don’t have an order. If your data is orders, like small, medium, large, you should use the Ordinal Encoding. In this article, we will learn how to use label encoding in R.

The Data

Let’s create a small data frame with cities and their population. The data is here is fake, but the process will work on any data frame. We have a list of cities, which is a categorical variable, that we want to encode.

df <- data.frame(
  pop = c(1000, 2000, 3000 , 4000),
  city = c('Dallas', 'Austin', 'Denver', 'Boulder')
)
df
##    pop    city
## 1 1000  Dallas
## 2 2000  Austin
## 3 3000  Denver
## 4 4000 Boulder

Using a Label Encoder in R

To encode our cities, turn them into numbers, we will use the LabelEncoder class from the superml package. We first create an instance of the class using the new method. Then, we use the fit_transform method to encode our variables.

library(superml)
## Warning: package 'superml' was built under R version 4.0.5

## Loading required package: R6
lbl = LabelEncoder$new()
df$city = lbl$fit_transform(df$city)
df
##    pop city
## 1 1000    0
## 2 2000    1
## 3 3000    2
## 4 4000    3

Reversing the Encoding in R

Now that we have converted the variable, we can reverse the encoding to recover the label names. To do this, we use the same instance of the LableEncoder and then call the inverse_transform function.

lbl$inverse_transform(df$city)
## [1] "Dallas"  "Austin"  "Denver"  "Boulder"