The auto regression model, or AR model, predicts a value at a particular time using previous lags (values at previous times). The model relies on the correlations between lags, or auto correlations, since the correlations are based on the same series. In this article, we will learn how to build an Autoregression model in R
Let’s take a brief look at the mathematical definition. We predicted value of an AR model follow this equation for AR(1)
yt = c + ϕ1 * yt − 1 + et Where:
If we want to expand to Ar(2) and more, we get add another lag, previous time value, and another coefficient. For example.
yt = c + ϕ1 * yt − 1 + ϕ2 * yt − 2 + et
Let’s say we have an AR model and we have the following sales for two months: [300]. We can then predict y_3 for the third month as follows:
y3 = c + ϕ1 * 300 + et
Where we find c and phi by performing linear regression on the data set and the e_t is assumed to be sampled from a normal distribution.
To test out ARIMA style models, we can use arima.sim
function. This
will simulate a model for us, so that we can test and verify our
technique. Let’s start with an AR(1) model using the code below. To
create an arr model, we set the model params to have order c(1, 0, 0)
.
This order represents 1 AR term, 0 diff terms, and 0 MA terms. We also
pass the value of the AR parameter which is .7.
model.params = list(order = c(1, 0, 0), ar = .7)
ar1.data = arima.sim(
model = model.params,
n = 50
)
plot(ar1.data)
When trying to model ARMA models, we usually use the ACF or PACF. It is worth noting that before this, you will want to have remove trend and seasonality. We will have a full article on ARIMA modeling later on. For AR models, the PACF will help us determine the component, but we also need to confirm the ACF does not drop off as well.
We start with the ACF plot. We can see there is not drastic drop off, there is simply a small degredation over time.
acf(ar1.data)
We now look at the PACF plot. We can see a steep drop off after the first lag suggesting an AR(1) model because the data seems highly correlated with the previous lage, value at the previous time. We expect this since we simulated that data.
pacf(ar1.data)
We can use the built in arima
function to model our data. We pass in
our data set and the order we want to model.
ar1 <- arima(ar1.data, order = c(1, 0, 0))
ar1
##
## Call:
## arima(x = ar1.data, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.7169 -0.0386
## s.e. 0.0952 0.4483
##
## sigma^2 estimated as 0.8855: log likelihood = -68.27, aic = 142.53
We can see the ar1 was modeled at .68 which is very close to the .7 we simulated. Another value to check here is the aic, 147.11, which we would use to confirm the model compared to others.
We can now forecast our model using the predict
function. Here we
predict 20 time values forward.
ar.pred <- predict(ar1, n.ahead = 20)$pred
ar.pred
## Time Series:
## Start = 51
## End = 70
## Frequency = 1
## [1] 0.185379356 0.121963426 0.076501833 0.043911342 0.020547880
## [6] 0.003799090 -0.008207778 -0.016815258 -0.022985785 -0.027409310
## [11] -0.030580446 -0.032853769 -0.034483468 -0.035651766 -0.036489295
## [16] -0.037089702 -0.037520122 -0.037828682 -0.038049882 -0.038208455
Now, let’s plot the predicted data with our previous data.
plot(ar1.data, xlim = c(0, 100))
lines(ar.pred, col = 'red')
Let’s finish with one more example. We will go a bit quick, and use all the steps above.
model.params = list(order = c(2, 0, 0), ar = c(.7, -.4))
ar2.data = arima.sim(
model = model.params,
n = 50
)
plot(ar2.data)
acf(ar2.data)
pacf(ar2.data)
ar2 <- arima(ar2.data, order = c(2, 0, 0))
ar2
##
## Call:
## arima(x = ar2.data, order = c(2, 0, 0))
##
## Coefficients:
## ar1 ar2 intercept
## 0.4193 -0.2624 -0.2421
## s.e. 0.1363 0.1357 0.1869
##
## sigma^2 estimated as 1.232: log likelihood = -76.29, aic = 160.58
ar2.pred <- predict(ar2, n.ahead = 20)$pred
ar2.pred
## Time Series:
## Start = 51
## End = 70
## Frequency = 1
## [1] -0.61931219 -0.08955086 -0.07917860 -0.21381676 -0.27299307 -0.26248280
## [7] -0.24255035 -0.23694993 -0.23983107 -0.24250848 -0.24287525 -0.24232660
## [13] -0.24200032 -0.24200745 -0.24209604 -0.24213132 -0.24212287 -0.24211007
## [19] -0.24210692 -0.24210896
plot(ar2.data, xlim = c(0, 100))
lines(ar2.pred, col = 'red')