Ordinal Encoding is similar to Label Encoding where we take a list of categories and convert them into integers. However, unlike Label Encoding, we preserve and order. For example, if we are encoding rankings of 1st place, 2nd place, etc, there is an inherit order. In this article, we will learn how to use Ordinal Encoding in R
Let’s create a small data frame with rankings and their prize money. The data is here is fake, but the process will work on any data frame. We have a list of ranks, which is a categorical variable, that we want to encode.
df <- data.frame(
money = c(3000, 1000, 2000, 2000),
rank = as.factor(c('1st', '2nd', '3rd', '1st'))
)
df
## money rank
## 1 3000 1st
## 2 1000 2nd
## 3 2000 3rd
## 4 2000 1st
To encode our ranks, turn them into numbers, we will use the
encode_ordinal
method from the cleandata
package. To use this
method, we simply pass our data frame of categories to the
encode_ordinal
method. Note that the columns we pass must be a
factor
type. We also pass a list of the order for the variables.
library(cleandata)
## Warning: package 'cleandata' was built under R version 4.0.5
order.list = c('1st', '2nd', '3rd')
## Create a data frame of all categories. We only have one here
cat.df = df[, c("rank"), drop = FALSE]
encoded = encode_ordinal(cat.df, order = order.list)
## rank
## 1st:2
## 2nd:1
## 3rd:1
## coded 1 cols 3 levels
## rank
## 1:2
## 2:1
## 3:1
encoded
## rank
## 1 1
## 2 2
## 3 3
## 4 1
Let’s update our data frame to have t-shirt sizes along with the ranks. We will also add ranks for a second tournament. Then, let’s see how we can encode multiple categories at the same time.
df <- data.frame(
money = c(3000, 1000, 2000, 2000),
rank = as.factor(c('1st', '2nd', '3rd', '1st')),
rank2 = as.factor(c('2nd', '3rd', '3rd', '1st')),
shirt = as.factor(c('sm', 'sm', 'med', 'lrg'))
)
df
## money rank rank2 shirt
## 1 3000 1st 2nd sm
## 2 1000 2nd 3rd sm
## 3 2000 3rd 3rd med
## 4 2000 1st 1st lrg
Now, we can create a data frame consisting of multiple categories to our
encode_ordinal
method.
library(cleandata)
order.list = c('1st', '2nd', '3rd')
## Create a data frame of all categories. We only have one here
cat.df = df[, c("rank", "rank2"), drop = FALSE]
## Encode both ranks
encoded = encode_ordinal(cat.df, order = order.list)
## rank rank2
## 1st:2 1st:1
## 2nd:1 2nd:1
## 3rd:1 3rd:2
## coded 2 cols 3 levels
## rank rank2
## 1:2 1:1
## 2:1 2:1
## 3:1 3:2
encoded
## rank rank2
## 1 1 2
## 2 2 3
## 3 3 3
## 4 1 1
## Encode shirts
cat.df = df[, c("shirt"), drop = FALSE]
encoded = encode_ordinal(cat.df, order = order.list)
## shirt
## lrg:1
## med:1
## sm :2
## coded 1 cols 3 levels
## shirt
## lrg:1
## med:1
## sm :2
encoded
## shirt
## 1 sm
## 2 sm
## 3 med
## 4 lrg