When working with time series, we deal with autocorrelation often. In our toolkit, we have a statistical test to check if a time series contains an autocorrelation. That test is Ljung-Box. In this article, we will learn how to perform a Ljung-Box test in R.
The Ljun-Box test is a hypothesis test that checks if a time series contains an autocorrelation. The null Hypothesis H0 is that the residuals are independently distributed. The alternative hypothesis is that the residuals are not independently distributed and exhibit a serial correlation.
Let’s load a data set of monthly milk production. We will load it from the url below. The data consists of monthly intervals and kilograms of milk produced.
df <- read.csv('https://raw.githubusercontent.com/ourcodingclub/CC-time-series/master/monthly_milk.csv')
df$month = as.Date(df$month)
head(df)
## month milk_prod_per_cow_kg
## 1 1962-01-01 265.05
## 2 1962-02-01 252.45
## 3 1962-03-01 288.00
## 4 1962-04-01 295.20
## 5 1962-05-01 327.15
## 6 1962-06-01 313.65
Now, we convert our data to a time series object then to an zoo
object
to have access to many indexing methods explored below.
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
df.ts = ts(df[, -1], frequency = 12, start=c(1962, 1, 1))
df.ts = as.zoo(df.ts)
head(df.ts)
## Jan 1962 Feb 1962 Mar 1962 Apr 1962 May 1962 Jun 1962
## 265.05 252.45 288.00 295.20 327.15 313.65
To conduct a Ljung-Box test, we can use the Box-test
function from the
built in stats
package. We pass our time series, a lag, and the type
which will be Ljung
.
We choose a lag of 1, because we want to see if there is autocorrelation with each lag.
Box.test(df.ts, lag = 1, type = "Ljung")
##
## Box-Ljung test
##
## data: df.ts
## X-squared = 135.94, df = 1, p-value < 2.2e-16
Here we see a p-value much smaller than .01, thus we can reject the null hypothesis, indicating the time series does contain an autocorrelation.
Now, we conduct another case with lag 12, because the time series seems to have seasonality every year.
Box.test(df.ts, lag = 12, type = "Ljung")
##
## Box-Ljung test
##
## data: df.ts
## X-squared = 852.41, df = 12, p-value < 2.2e-16
Again, we see a p-value much smaller than .01, thus we can reject the null hypothesis, indicating the time series does contain an autocorrelation.