# Subsetting a Time Series in R

## Intro

When working with time series, we often want to access a subset of our data based on a range of dates. When using data frames, we have many ways to index and subset data. With the help of the R echo system, we have many methods for subsetting time series. In this article, we will learn how to subset time series in R.

## Data

Letâ€™s load a data set of monthly milk production. We will load it from the url below. The data consists of monthly intervals and kilograms of milk produced.

df <- read.csv('https://raw.githubusercontent.com/ourcodingclub/CC-time-series/master/monthly_milk.csv')
df$month = as.Date(df$month)
head(df)
##        month milk_prod_per_cow_kg
## 1 1962-01-01               265.05
## 2 1962-02-01               252.45
## 3 1962-03-01               288.00
## 4 1962-04-01               295.20
## 5 1962-05-01               327.15
## 6 1962-06-01               313.65

Now, we convert our data to a time series object then to an xts object to have access to many indexing methods explored below.

library(xts)
## Warning: package 'xts' was built under R version 4.0.5

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 4.0.5

##
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
##
##     as.Date, as.Date.numeric
df.ts = ts(df[, -1], frequency = 12, start=c(1962, 1, 1))
df.ts = as.xts(df.ts)
head(df.ts)
##            [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20
## May 1962 327.15
## Jun 1962 313.65

## Subsetting with Row Numbers

The basic way to subset time series is to pass the rows we would like to access. In the example below we pass rows 1-4 to retieve the first four observations.

df.ts[1:4]
##            [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20

## Subsetting by Date

We can also pass a date or vector of dates to retrieve values from a time series. In the examples below we first select by a single date, then by a range of dates.

df.ts[as.Date("1962-01-01")]
##            [,1]
## Jan 1962 265.05
dates = seq(as.Date("1962-01-01"), as.Date("1962-08-01"), 1)
df.ts[dates]
##            [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20
## May 1962 327.15
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55

## Subsetting by date range

We also have the ability to use a custom range formatting. We can pass a fromDate/toDate to select a range of dates.

In this example, we select the rows from 1962 to 1963,

df.ts['1962/1963']
##            [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20
## May 1962 327.15
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55
## Sep 1962 255.60
## Oct 1962 259.65
## Nov 1962 248.85
## Dec 1962 261.90
## Jan 1963 270.00
## Feb 1963 254.70
## Mar 1963 293.85
## Apr 1963 302.85
## May 1963 333.90
## Jun 1963 322.20
## Jul 1963 297.00
## Aug 1963 277.65
## Sep 1963 262.35
## Oct 1963 264.15
## Nov 1963 254.25
## Dec 1963 269.10

In this example, we select the range from June 1962 to May 1963.

df.ts['196206/196305']
##            [,1]
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55
## Sep 1962 255.60
## Oct 1962 259.65
## Nov 1962 248.85
## Dec 1962 261.90
## Jan 1963 270.00
## Feb 1963 254.70
## Mar 1963 293.85
## Apr 1963 302.85
## May 1963 333.90

## Using the Window Function

The final way we will look at subsetting data is using the window function. To use this, we can pass in our time series follow by a start and end date. We will the get back the times within that range.

window(df.ts, start = '1962-06-01', end = '1963-06-01')
##            [,1]
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55
## Sep 1962 255.60
## Oct 1962 259.65
## Nov 1962 248.85
## Dec 1962 261.90
## Jan 1963 270.00
## Feb 1963 254.70
## Mar 1963 293.85
## Apr 1963 302.85
## May 1963 333.90
## Jun 1963 322.20