When working with time series models, we would often like to plot the data to see how it changes over time. This is a simply line plot, but the x-axis is always dates. In this article, we will learn how to plot a time series in R.
Lett's load a data set of monthly milk production. We will load it from the url below. The data consists of monthly intervals and kilograms of milk produced.
import pandas as pd
data = 'https://raw.githubusercontent.com/ourcodingclub/CC-time-series/master/monthly_milk.csv' df = pd.read_csv(data) df.head()
Next, we want to convert our
month column to a python datetime object. We can use the
to_datetime method to do this.
df['month'] = pd.to_datetime(df['month'])
Next, we want to set the index of our data frame to the
month column. This is not required to plot that data, but is a good practice when working with time series. This new index will make plotting, indexing, and aggregating with pandas easier for dates.
Once we have a data frame indexed on a date, we can call the
plot method with
kind='line' to create a line plot.
import matplotlib df_ts = df.set_index('month') df_ts.plot(kind = 'line')
Now that we know how to create a basic plot, we can also customize our plot a bit. Let's say we would like to plot the yearly data instead. We could use upsampling (this will be covered in a separate article). We can also update our date column to display only the years.
We will use
.dt.to_period method to change our date to years
Y. Then we will drop the month column and use our new
year column as our index.
df['year'] = pd.to_datetime(df['month']).dt.to_period('Y') year_df = df.drop('month', axis=1) year_df.head()
year_df.set_index('year').plot(kind = 'line')