Label Encoding is one of many encoding techniques to convert your categorical variables into numerical variables. This is a requirement for many machine learning algorithms. Label Encoding is used when you have a number of categories that don’t have an order. If your data is orders, like small, medium, large, you should use the Ordinal Encoding. In this article, we will learn how to use label encoding in R.
Let’s create a small data frame with cities and their population. The data is here is fake, but the process will work on any data frame. We have a list of cities, which is a categorical variable, that we want to encode.
df <- data.frame(
pop = c(1000, 2000, 3000 , 4000),
city = c('Dallas', 'Austin', 'Denver', 'Boulder')
)
df
## pop city
## 1 1000 Dallas
## 2 2000 Austin
## 3 3000 Denver
## 4 4000 Boulder
To encode our cities, turn them into numbers, we will use the
LabelEncoder
class from the superml
package. We first create an
instance of the class using the new
method. Then, we use the
fit_transform
method to encode our variables.
library(superml)
## Warning: package 'superml' was built under R version 4.0.5
## Loading required package: R6
lbl = LabelEncoder$new()
df$city = lbl$fit_transform(df$city)
df
## pop city
## 1 1000 0
## 2 2000 1
## 3 3000 2
## 4 4000 3
Now that we have converted the variable, we can reverse the encoding to
recover the label names. To do this, we use the same instance of the
LableEncoder and then call the inverse_transform
function.
lbl$inverse_transform(df$city)
## [1] "Dallas" "Austin" "Denver" "Boulder"