 KoalaTea

# How to Check Stationarity of Time Series data in R

## Intro

Before modeling a time series data set, we often want to check if the data is stationary. Many models assume stationary time series, and if this assumption is violated, our forcast will not be reliable. In this article, we will learn how to check stationarity of time series data in R.

## Data

Let’s load a data set of monthly milk production. We will load it from the url below. The data consists of monthly intervals and kilograms of milk produced.

df <- read.csv('https://raw.githubusercontent.com/ourcodingclub/CC-time-series/master/monthly_milk.csv')
df$month = as.Date(df$month)
head(df)
##        month milk_prod_per_cow_kg
## 1 1962-01-01               265.05
## 2 1962-02-01               252.45
## 3 1962-03-01               288.00
## 4 1962-04-01               295.20
## 5 1962-05-01               327.15
## 6 1962-06-01               313.65

Now, we convert our data to a time series object using the R ts method.

df.ts = ts(df[, -1], frequency = 12, start=c(1962, 1, 1))
head(df.ts)
##  265.05 252.45 288.00 295.20 327.15 313.65

## Visually Checking

One way to check if the data is stationary is to plot the data. This should always be used in combination with other methods, but some data easily show trends and seasonility. For example the plot below, we can see that there is a trend upward and a definitely seasonal pattern.

plot(df.ts)

Another way to check if the data is stationary is to use the ADF test. This test will check for a unit root. If there is a unit root, then the data is not stationary. The ADF test is a hypothesis test with the null hypothesis being there is a unit root (non-stationary) and the alternative being there is not a unit root (stationary). We can use the adf.test method from the tseries library to check.

library(tseries)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
adf.test(df.ts)
## Warning in adf.test(df.ts): p-value smaller than printed p-value

##
##  Augmented Dickey-Fuller Test
##
## data:  df.ts
## Dickey-Fuller = -9.9714, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

## Using the Ljung-Box test

Another test we can use is the Ljung-Box test. This test will check our data for independence. This is another hypothesis test with the assumption being that the data is independent, thus stationary. Alternatively, if we get a low p-value, we can reject the null hypothesis and assume the data is non-stationary.

Box.test(df.ts, lag=12, type="Ljung-Box") 
##
##  Box-Ljung test
##
## data:  df.ts
## X-squared = 852.41, df = 12, p-value < 2.2e-16