How to Filter and Subset a Time Series in Python

07.10.2021

Intro

Using time series is a common task in data science with python. We often want to select specific information based on dates or a date range. In this article, we will learn how to index and subset time series data in Python.

Creating a Basic Time Series

Let's start by creating a basic time series. We will have two columns, a list of dates and the respective sales.

import pandas as pd
sales = [100, 300, 400, 200]
dates = pd.date_range("20180101", periods=4, freq="Y")
df = pd.DataFrame.from_dict({
    'dates': dates,
    'sales': sales
})
df.head()
dates sales
0 2018-12-31 100
1 2019-12-31 300
2 2020-12-31 400
3 2021-12-31 200

Now, we will set the index of our data frame to be dates so we can use the selectors for dates.

df['dates'] = pd.to_datetime(df['dates'])
df = df.set_index('dates')
df.head()
sales
dates
2018-12-31 100
2019-12-31 300
2020-12-31 400
2021-12-31 200

Select Data Based on Dates

The first way we can select data based on date is using the loc selector. We can pass in a specific date as follows.

df.loc['2018-12-31']
sales    100
Name: 2018-12-31 00:00:00, dtype: int64

We can also select groups of data using the loc selector. For examplem we can select year this way.

df.loc['2019']
sales
dates
2019-12-31 300

In a similar way we can select all days under a year and month using the following.

df.loc['2020-12']
sales
dates
2020-12-31 400

We can also use the normal bracket notation to select between a range of dates. For example, we can use the following to select between January 2019 and February 2021 (not including the final month/year.)

df["2019-1":"2021-2"]
sales
dates
2019-12-31 300
2020-12-31 400

In a similar way, we can select between two years.

df["2019":"2021"]
sales
dates
2019-12-31 300
2020-12-31 400
2021-12-31 200