R Resample Time Series

07.14.2021

Intro

Resampling is a common task when working with time series dta. Resampling goes in two directions, upsampling and downsampling. Upsampling allows us to go from a lower time frame to a higher, i.e. minutes to hours. Downsampling is the reverse. In this article, we will learn how to do resampling in R.

Loading the Data

Let’s load a data set of monthly milk production. We will load it from the url below. The data consists of monthly intervals and kilograms of milk produced.

require(zoo) # For time series
## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
require(data.table) # High performance data frame
## Loading required package: data.table
require(curl) # To load from url for data.table's fread
## Loading required package: curl

## Using libcurl 7.64.1 with Schannel
df.ts <- fread('https://raw.githubusercontent.com/ourcodingclub/CC-time-series/master/monthly_milk.csv')
df.ts[, month := as.Date(month)]

# Index by our date file since this is a time series
setkey(df.ts, month)

Downsampling in R

The first type of resample we will do is downsampling. We can do this my dropping other values. We will use the seq.int method to create indexes for every year.

ts.indexes = seq.int(from = 1, to = nrow(df.ts), by = 12)
ts.indexes
##  [1]   1  13  25  37  49  61  73  85  97 109 121 133 145 157

We will then pass these indexes to our data frame which will drop other data.

df.ts[ts.indexes]
##          month milk_prod_per_cow_kg
##  1: 1962-01-01               265.05
##  2: 1963-01-01               270.00
##  3: 1964-01-01               282.60
##  4: 1965-01-01               296.10
##  5: 1966-01-01               304.65
##  6: 1967-01-01               320.85
##  7: 1968-01-01               322.65
##  8: 1969-01-01               330.30
##  9: 1970-01-01               337.50
## 10: 1971-01-01               361.80
## 11: 1972-01-01               371.70
## 12: 1973-01-01               369.45
## 13: 1974-01-01               372.60
## 14: 1975-01-01               375.30

A more common task during downsampling is to aggreagte data rather than drop. Here we will take the mean of each year. We pass the mean paramter in the j index and state that this will be by a fomrat of the date. This a bit of a trick as we extract the year then group by that value.

df.ts[, mean(milk_prod_per_cow_kg), by = format(month, "%Y")]
##     format       V1
##  1:   1962 277.0875
##  2:   1963 283.5000
##  3:   1964 296.4375
##  4:   1965 302.8875
##  5:   1966 318.9375
##  6:   1967 329.8125
##  7:   1968 336.9750
##  8:   1969 343.7625
##  9:   1970 351.9000
## 10:   1971 375.3375
## 11:   1972 384.3750
## 12:   1973 379.1625
## 13:   1974 386.2875
## 14:   1975 388.2000

Upsampling in R

Upsampling is the opposite of downsampling, as the name implies. We can run upsampling in a similar way, however, we will need to fill in missing data as we have no way of creating empty values. If we go from year data to month, we have no idea of what really happened each month, so we need to use a fill strategy.

For example, our data set has monthly data. If we want to upsample to daily data, we can start by creating indexes for each day.

start = df.ts$month[1]
end = tail(df.ts$month, 1)

date.indexes = seq(
  from = start,
  to = end,
  by = "days"
)
head(date.indexes)
## [1] "1962-01-01" "1962-01-02" "1962-01-03" "1962-01-04" "1962-01-05"
## [6] "1962-01-06"

We will now use a forward fill strategy which will carry the value at the start of the month forward through days that are missing the data.

df.ts[J(date.indexes), roll = 31]
##            month milk_prod_per_cow_kg
##    1: 1962-01-01               265.05
##    2: 1962-01-02               265.05
##    3: 1962-01-03               265.05
##    4: 1962-01-04               265.05
##    5: 1962-01-05               265.05
##   ---                                
## 5079: 1975-11-27               358.65
## 5080: 1975-11-28               358.65
## 5081: 1975-11-29               358.65
## 5082: 1975-11-30               358.65
## 5083: 1975-12-01               379.35