Subsetting a Time Series in R

07.09.2021

Intro

When working with time series, we often want to access a subset of our data based on a range of dates. When using data frames, we have many ways to index and subset data. With the help of the R echo system, we have many methods for subsetting time series. In this article, we will learn how to subset time series in R.

Data

Let’s load a data set of monthly milk production. We will load it from the url below. The data consists of monthly intervals and kilograms of milk produced.

df <- read.csv('https://raw.githubusercontent.com/ourcodingclub/CC-time-series/master/monthly_milk.csv')
df$month = as.Date(df$month)
head(df)
##        month milk_prod_per_cow_kg
## 1 1962-01-01               265.05
## 2 1962-02-01               252.45
## 3 1962-03-01               288.00
## 4 1962-04-01               295.20
## 5 1962-05-01               327.15
## 6 1962-06-01               313.65

Now, we convert our data to a time series object then to an xts object to have access to many indexing methods explored below.

library(xts)
## Warning: package 'xts' was built under R version 4.0.5

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 4.0.5

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
df.ts = ts(df[, -1], frequency = 12, start=c(1962, 1, 1))
df.ts = as.xts(df.ts)
head(df.ts)
##            [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20
## May 1962 327.15
## Jun 1962 313.65

Subsetting with Row Numbers

The basic way to subset time series is to pass the rows we would like to access. In the example below we pass rows 1-4 to retieve the first four observations.

df.ts[1:4]
##            [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20

Subsetting by Date

We can also pass a date or vector of dates to retrieve values from a time series. In the examples below we first select by a single date, then by a range of dates.

df.ts[as.Date("1962-01-01")]
##            [,1]
## Jan 1962 265.05
dates = seq(as.Date("1962-01-01"), as.Date("1962-08-01"), 1)
df.ts[dates]
##            [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20
## May 1962 327.15
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55

Subsetting by date range

We also have the ability to use a custom range formatting. We can pass a fromDate/toDate to select a range of dates.

In this example, we select the rows from 1962 to 1963,

df.ts['1962/1963']
##            [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20
## May 1962 327.15
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55
## Sep 1962 255.60
## Oct 1962 259.65
## Nov 1962 248.85
## Dec 1962 261.90
## Jan 1963 270.00
## Feb 1963 254.70
## Mar 1963 293.85
## Apr 1963 302.85
## May 1963 333.90
## Jun 1963 322.20
## Jul 1963 297.00
## Aug 1963 277.65
## Sep 1963 262.35
## Oct 1963 264.15
## Nov 1963 254.25
## Dec 1963 269.10

In this example, we select the range from June 1962 to May 1963.

df.ts['196206/196305']
##            [,1]
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55
## Sep 1962 255.60
## Oct 1962 259.65
## Nov 1962 248.85
## Dec 1962 261.90
## Jan 1963 270.00
## Feb 1963 254.70
## Mar 1963 293.85
## Apr 1963 302.85
## May 1963 333.90

Using the Window Function

The final way we will look at subsetting data is using the window function. To use this, we can pass in our time series follow by a start and end date. We will the get back the times within that range.

window(df.ts, start = '1962-06-01', end = '1963-06-01')
##            [,1]
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55
## Sep 1962 255.60
## Oct 1962 259.65
## Nov 1962 248.85
## Dec 1962 261.90
## Jan 1963 270.00
## Feb 1963 254.70
## Mar 1963 293.85
## Apr 1963 302.85
## May 1963 333.90
## Jun 1963 322.20