When working with time series, we deal with autocorrelation often. In our toolkit, we have a statistical test to check if a time series contains an autocorrelation. That test is Ljung-Box. In this article, we will learn how to perform a Ljung-Box test in Python.
The Ljung-Box test is a hypothesis test that checks if a time series contains an autocorrelation. The null Hypothesis H0 is that the residuals are independently distributed. The alternative hypothesis is that the residuals are not independently distributed and exhibit a serial correlation.
Let's load a data set of monthly milk production. We will load it from the url below. The data consists of monthly intervals and kilograms of milk produced.
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/ourcodingclub/CC-time-series/master/monthly_milk.csv')
df.month = pd.to_datetime(df.month)
df = df.set_index('month')
df.head()
milk_prod_per_cow_kg | |
---|---|
month | |
1962-01-01 | 265.05 |
1962-02-01 | 252.45 |
1962-03-01 | 288.00 |
1962-04-01 | 295.20 |
1962-05-01 | 327.15 |
To conduct a Ljung-Box test, we can use the acorr_ljungbox
function from the built in statsmodels
package. We pass our time series and a lag.
We choose a lag of 1, because we want to see if there is autocorrelation with each lag.
from statsmodels.stats.diagnostic import acorr_ljungbox
acorr_ljungbox(df, lags=[1], return_df=True)
lb_stat | lb_pvalue | |
---|---|---|
1 | 135.942829 | 2.053590e-31 |
Here we see a p-value much smaller than .01, thus we can reject the null hypothesis, indicating the time series does contain an autocorrelation.
Now, we conduct another case with lag 12, because the time series seems to have seasonality every year.
from statsmodels.stats.diagnostic import acorr_ljungbox
acorr_ljungbox(df, lags=[12], return_df=True)
lb_stat | lb_pvalue | |
---|---|---|
12 | 852.413094 | 9.438013e-175 |
Again, we see a p-value much smaller than .01, thus we can reject the null hypothesis, indicating the time series does contain an autocorrelation.