Box Cox in R

06.29.2021

Intro

A Box-Cox transformation is a preprocessing technique used to transform a distribution into a normally distributed one. Normal distribution is often a requirement, especially for linear regression. The Box-Cox transformation doesn’t guarantee that your data will be noramlly distributed afterwards, so you will always need to check. In this article, we will learn how to conduct a box-cox transformation in R.

Example Box Cox in R

We begin by creating some mock data. We will generate samples from the exponential distribution. Notice from the histogram that our data is definitely not normal.

x =  rexp(10000, rate = 5)

hist(x)

unnamed chunk 1 1

There are quite a few methods for using box cox in R. However, many of them require a model. The forecast package provides a function called BoxCox that will automatically transform the data for you. We pass our X vector in and the transformed data is returned.

library(forecast)
x.transformed = BoxCox(x, lambda = "auto")
hist(x.transformed)

unnamed chunk 2 1

Notice that our data is more normal, but not completely normal. This is why you need to confirm. After this visual check, it would be good to run other normality tests, like shahpiro-wilk, to give further evidence of normality.