Before modeling a time series data set, we often want to check if the data is stationary. Many models assume stationary time series, and if this assumption is violated, our forcast will not be reliable. In this article, we will learn how to check stationarity of time series data in R.
Let’s load a data set of monthly milk production. We will load it from the url below. The data consists of monthly intervals and kilograms of milk produced.
df <- read.csv('https://raw.githubusercontent.com/ourcodingclub/CC-time-series/master/monthly_milk.csv')
df$month = as.Date(df$month)
head(df)
## month milk_prod_per_cow_kg
## 1 1962-01-01 265.05
## 2 1962-02-01 252.45
## 3 1962-03-01 288.00
## 4 1962-04-01 295.20
## 5 1962-05-01 327.15
## 6 1962-06-01 313.65
Now, we convert our data to a time series object using the R ts
method.
df.ts = ts(df[, -1], frequency = 12, start=c(1962, 1, 1))
head(df.ts)
## [1] 265.05 252.45 288.00 295.20 327.15 313.65
One way to check if the data is stationary is to plot the data. This should always be used in combination with other methods, but some data easily show trends and seasonility. For example the plot below, we can see that there is a trend upward and a definitely seasonal pattern.
plot(df.ts)
Another way to check if the data is stationary is to use the ADF test.
This test will check for a unit root. If there is a unit root, then the
data is not stationary. The ADF test is a hypothesis test with the null
hypothesis being there is a unit root (non-stationary) and the
alternative being there is not a unit root (stationary). We can use the
adf.test
method from the tseries
library to check.
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
adf.test(df.ts)
## Warning in adf.test(df.ts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: df.ts
## Dickey-Fuller = -9.9714, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
Another test we can use is the Ljung-Box test. This test will check our data for independence. This is another hypothesis test with the assumption being that the data is independent, thus stationary. Alternatively, if we get a low p-value, we can reject the null hypothesis and assume the data is non-stationary.
Box.test(df.ts, lag=12, type="Ljung-Box")
##
## Box-Ljung test
##
## data: df.ts
## X-squared = 852.41, df = 12, p-value < 2.2e-16