When working with time series, we often want to access a subset of our data based on a range of dates. When using data frames, we have many ways to index and subset data. With the help of the R echo system, we have many methods for subsetting time series. In this article, we will learn how to subset time series in R.
Let’s load a data set of monthly milk production. We will load it from the url below. The data consists of monthly intervals and kilograms of milk produced.
df <- read.csv('https://raw.githubusercontent.com/ourcodingclub/CC-time-series/master/monthly_milk.csv')
df$month = as.Date(df$month)
head(df)
## month milk_prod_per_cow_kg
## 1 1962-01-01 265.05
## 2 1962-02-01 252.45
## 3 1962-03-01 288.00
## 4 1962-04-01 295.20
## 5 1962-05-01 327.15
## 6 1962-06-01 313.65
Now, we convert our data to a time series object then to an xts
object
to have access to many indexing methods explored below.
library(xts)
## Warning: package 'xts' was built under R version 4.0.5
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.0.5
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
df.ts = ts(df[, -1], frequency = 12, start=c(1962, 1, 1))
df.ts = as.xts(df.ts)
head(df.ts)
## [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20
## May 1962 327.15
## Jun 1962 313.65
The basic way to subset time series is to pass the rows we would like to access. In the example below we pass rows 1-4 to retieve the first four observations.
df.ts[1:4]
## [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20
We can also pass a date or vector of dates to retrieve values from a time series. In the examples below we first select by a single date, then by a range of dates.
df.ts[as.Date("1962-01-01")]
## [,1]
## Jan 1962 265.05
dates = seq(as.Date("1962-01-01"), as.Date("1962-08-01"), 1)
df.ts[dates]
## [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20
## May 1962 327.15
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55
We also have the ability to use a custom range formatting. We can pass a
fromDate/toDate
to select a range of dates.
In this example, we select the rows from 1962 to 1963,
df.ts['1962/1963']
## [,1]
## Jan 1962 265.05
## Feb 1962 252.45
## Mar 1962 288.00
## Apr 1962 295.20
## May 1962 327.15
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55
## Sep 1962 255.60
## Oct 1962 259.65
## Nov 1962 248.85
## Dec 1962 261.90
## Jan 1963 270.00
## Feb 1963 254.70
## Mar 1963 293.85
## Apr 1963 302.85
## May 1963 333.90
## Jun 1963 322.20
## Jul 1963 297.00
## Aug 1963 277.65
## Sep 1963 262.35
## Oct 1963 264.15
## Nov 1963 254.25
## Dec 1963 269.10
In this example, we select the range from June 1962 to May 1963.
df.ts['196206/196305']
## [,1]
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55
## Sep 1962 255.60
## Oct 1962 259.65
## Nov 1962 248.85
## Dec 1962 261.90
## Jan 1963 270.00
## Feb 1963 254.70
## Mar 1963 293.85
## Apr 1963 302.85
## May 1963 333.90
The final way we will look at subsetting data is using the window
function. To use this, we can pass in our time series follow by a start
and end date. We will the get back the times within that range.
window(df.ts, start = '1962-06-01', end = '1963-06-01')
## [,1]
## Jun 1962 313.65
## Jul 1962 288.00
## Aug 1962 269.55
## Sep 1962 255.60
## Oct 1962 259.65
## Nov 1962 248.85
## Dec 1962 261.90
## Jan 1963 270.00
## Feb 1963 254.70
## Mar 1963 293.85
## Apr 1963 302.85
## May 1963 333.90
## Jun 1963 322.20