Manipulating Time Series

Author

Lucas S. Macoris (FGV-EAESP)

About this Document

This file replicates the codes that have been discussed in the live-session lectures of the Practical Applications in Quantitative Finance course. To ensure you can run the codes without issues, please install and load all required packages beforehand. It is always a good practice to replicate this Quarto document and experiment by making edits to the parameters. At the end of this report, you will find a suggestion on how to tweak this report — try doing some changes on your own!

Attention

In this lecture, we will be working with daily stock price data from several stocks included in the S&P 500 index. Instead of loading the data from a .csv file, we will pull the data directly from R using the tidyquant package. Before you start, make sure to follow the instructions from our previous replication to set up your working directory correctly.

Loading packages

As we get started, we will be loading all packages referred in our official website.

# Package names
packages <- c("tidyverse","tidyquant","tidymodels", "glue","scales","ggthemes")

# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())

if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])
}

# Load all packages
invisible(lapply(packages, library, character.only = TRUE))

Note that you could easily get around this by installing and loading all necessary packages using a more simple syntax:

#Install if not already available - I have commented these lines so that R does not attempt to install it everytime
  #install.packages('tidyverse')
  #install.packages('tidyquant')
  #install.packages('glue')
  #install.packages('scales')
  #install.packages('ggthemes')

#Load
  library(tidyverse)
  library(tidyquant)
  library(tidymodels)
  library(glue)
  library(scales)
  library(ggthemes)

Working with Time Series

In the previous lectures, you worked your way through the exercises by using the amazing dplyr functionalities on data.frames. In some cases, however, you had to do some workarounds with drop_na(), slice_tail() and lag() simply because you were manipulating time series data.

The xts, which stantds for eXtensible Time Series is an R package that is is widely used for handling and manipulating time series data. It extends the functionality of the zoo package by providing a structured framework for managing time-indexed data efficiently. Such package provides a matrix-like structure where each row is indexed by a date-time value, allowing for efficient subsetting, merging, and manipulation of time series data. It is especially useful in financial applications where time-stamped data is common.

library(xts) # Note that this is loaded together with the tidyquant package

# Create a vector of random values
data_values <- rnorm(10)

# Create a sequence of dates
dates <- as.Date("2024-01-01") + 0:9

# Convert to xts
xts_data <- xts(data_values, order.by = dates)

# Print the xts object
print(xts_data)

                 [,1]
2024-01-01 -0.8782227
2024-01-02 -0.4597098
2024-01-03 -0.3964338
2024-01-04 -1.0016150
2024-01-05 -1.5862849
2024-01-06 -0.1767326
2024-01-07 -0.9713712
2024-01-08  0.3229833
2024-01-09 -1.7203924
2024-01-10 -0.7158226

The output shows an interesting feature of an xts format in R:

The first column contains the values
The row names are timestamps
The xts object retains an efficient internal structure

Core features of `xts`

Time-Based Indexing & Subsetting: you can subset an xts object using time-based indexing in a variety of ways. If you were to do this in a data.frame, R wouldn’t be able to retrieve the data, as a data.frame does not carry any time series properties that are needed for the job:

# Subset a specific date
xts_data["2024-01-03"]

                 [,1]
2024-01-03 -0.3964338

# Subset a range
xts_data["2024-01-03/2024-01-06"]

                 [,1]
2024-01-03 -0.3964338
2024-01-04 -1.0016150
2024-01-05 -1.5862849
2024-01-06 -0.1767326

# Subset a custom Y-M-D definition
xts_data["2024"]       # Returns all data from 2024

                 [,1]
2024-01-01 -0.8782227
2024-01-02 -0.4597098
2024-01-03 -0.3964338
2024-01-04 -1.0016150
2024-01-05 -1.5862849
2024-01-06 -0.1767326
2024-01-07 -0.9713712
2024-01-08  0.3229833
2024-01-09 -1.7203924
2024-01-10 -0.7158226

xts_data["2024-01"]    # Returns all data from January 2024

                 [,1]
2024-01-01 -0.8782227
2024-01-02 -0.4597098
2024-01-03 -0.3964338
2024-01-04 -1.0016150
2024-01-05 -1.5862849
2024-01-06 -0.1767326
2024-01-07 -0.9713712
2024-01-08  0.3229833
2024-01-09 -1.7203924
2024-01-10 -0.7158226

Merging and Combining Time Series: you can merge two xts objects using by the timestamp component that is embedded in the xts structure:

data1 <- xts(rnorm(5), order.by = as.Date("2024-01-01") + 0:4)
data2 <- xts(rnorm(5), order.by = as.Date("2024-01-01") + 2:6)

merged_data <- merge(data1, data2, join = "outer")
print(merged_data)

                data1       data2
2024-01-01 -0.3247350          NA
2024-01-02 -0.4150009          NA
2024-01-03  1.4498303 -1.47773093
2024-01-04  1.6398024  0.41617149
2024-01-05 -0.9977372 -0.06436564
2024-01-06         NA  0.81567706
2024-01-07         NA  1.17189255

join = "outer" ensures all time points are included
If a time point is missing in one dataset, it is filled with NA
Merging and Combining Time Series: using functions that retrieve leads and lags of a given variable are a key component of xts objects. Due to its timestamp component, one can shift variables backwards (using the lag() function) as well as forward (using the lag()) function:

lagged_data <- lag(data1)  # Shift values by 1 day
print(lagged_data)

                 [,1]
2024-01-01         NA
2024-01-02 -0.3247350
2024-01-03 -0.4150009
2024-01-04  1.4498303
2024-01-05  1.6398024

merged_data <- merge(lagged_data, data1, join = "outer")
print(merged_data)

           lagged_data      data1
2024-01-01          NA -0.3247350
2024-01-02  -0.3247350 -0.4150009
2024-01-03  -0.4150009  1.4498303
2024-01-04   1.4498303  1.6398024
2024-01-05   1.6398024 -0.9977372

All in all, when it comes to working with financial data, xts objects are particularly useful for:

Handling stock price data from tidyquant
Calculating log returns for portfolio management
Aligning time series data (e.g., joining different financial datasets)
Aggregating financial data (e.g., monthly returns)
Subsetting data by years/months/days
Calculating rolling functions (e.g, yearly averages)
Aggregating data at different intervals (e.g, convert daily to weekly prices)

Unfortunately, there is an issue: the tidyverse is not fully designed to work with time series classes, such as xts. Since xts is optimized for time series operations, some functions that would work well when managing time series are not easily translated using the packages from the tidyverse:

tidyverse: Designed for tabular (data frame-like) structures, emphasizing “tidy” data (each row is an observation, each column is a variable)
xts: Designed for time series, optimizing time-based indexing and calculations but less compatible with tidyverse workflows

Introducing the `tidyquant` package

The tidyquant package (see official documentation here) helps integrate both paradigms, making financial analysis more intuitive in R. It integrates several financial packages, like zoo, xts, quantmod, TTR, and PerformanceAnalytics, with the tidy data infrastructure of the tidyverse, allowing for seamless interaction between tidyverse data manipulation and financial functions.

There are mainly three functions in the tidyquant package that we will be using throughout this lecture: tq_get(), tq_mutate(), and tq_transmute().

The `tq_get()` function

Definition

The tq_transmute() returns only newly created columns and is typically used when periodicity changes. Its syntax is the following:

The tq_get() function is a powerful tool for retrieving financial data in a tidy format. It provides an easy way to import stock prices, economic data, and key financial metrics from multiple sources, including Yahoo Finance, FRED, Alpha Vantage, and Quandl.

tq_get(x, get = "stock.prices", from = NULL, to = NULL, ...)

x: a character string or vector specifying the stock symbol(s) or identifier(s)
get: the type of data to retrieve. Defaults to "stock.prices"
from: start date, in YYYY-MM-DD format. Defaults to NULL (gets all available data)
to: end date, in YYYY-MM-DD format. Defaults to NULL (gets all available data)
...: additional arguments specific to the data source

One of the most common uses of tq_get() is fetching stock price data. Say, for example, that you want to fetch data from Apple between January and February for 2024. It is easy to use tq_get() to retrieve such information:

# Assuming you have the tidyquant loaded in your session

# Fetch Apple (AAPL) stock data from 2024-01-01 to 2024-02-01
aapl_data <- tq_get("AAPL", from = "2024-01-01", to = "2024-02-01")

# Print first few rows
head(aapl_data)

# A tibble: 6 × 8
  symbol date        open  high   low close   volume adjusted
  <chr>  <date>     <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 AAPL   2024-01-02  187.  188.  184.  186. 82488700     185.
2 AAPL   2024-01-03  184.  186.  183.  184. 58414500     183.
3 AAPL   2024-01-04  182.  183.  181.  182. 71983600     181.
4 AAPL   2024-01-05  182.  183.  180.  181. 62303300     180.
5 AAPL   2024-01-08  182.  186.  182.  186. 59144500     184.
6 AAPL   2024-01-09  184.  185.  183.  185. 42841800     184.

The result is a tidy tibble, unlike quantmod’s xts format.

The `tq_mutate()` function

Definition

The tq_mutate() function adds adds new variables to an existing tibble:

tq_mutate(.data, #The object you are performing the calculations 
       selected_variables, #The columns to send to the mutation function
       mutate_fun, #The mutation function from either the xts, quantmod, or TTR package.
       col_rename #A string or character vector containing names that can be used to quickly rename columns
       )

The main advantage is the results are returned as a tibble and the function can be used with the tidyverse
It is used when you expected additional columns to be added to the resulting data frame
You can use several time series related functions from other R packages - call tq_mutate_fun_options() to see the list of available options
All in all, it is similar in spirit to mutate()

The `tq_transmute()` function

Definition

The tq_transmute() returns only newly created columns and is typically used when periodicity changes. Its syntax is the following:

tq_mutate(.data, #The object you are performing the calculations 
       selected_variables, #The columns to send to the mutation function
       mutate_fun, #The mutation function from either the xts, quantmod, or TTR package.
       col_rename #A string or character vector containing names that can be used to quickly rename columns
       )

tq_transmute() works exactly like tq_mutate() except it only returns the newly created columns
This is helpful when changing periodicity where the new columns would not have the same number of rows as the original tibble
All in all, it is similar in spirit to summarize()

Working with time series objects

An immediate useful example of using a time series specific functionality with a tidyverse logic relates to filtering. Sometimes, we may be interested in getting only a subset of the data (for example, only GOOG information). Furthermore, we may be interested in subsetting only a specific time frame for our analysis

It is relatively straightforward to do it with tidyquant:

Use filter() to select only rows where symbol=='GOOG'
In the same call, filter for date>= min_date and date<=max_date

#Assuming you have the tidyverse and the tidyquant packages loadded

#Set up the list of assets
assets=c('AMZN','GOOG','META','GME')

#Filter out
assets%>%
  tq_get()%>%
  filter(symbol=='GOOG',
         date>='2020-01-01',
         date<='2024-12-31')

# A tibble: 1,258 × 8
   symbol date        open  high   low close   volume adjusted
   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
 1 GOOG   2020-01-02  67.1  68.4  67.1  68.4 28132000     68.1
 2 GOOG   2020-01-03  67.4  68.6  67.3  68.0 23728000     67.8
 3 GOOG   2020-01-06  67.5  69.8  67.5  69.7 34646000     69.5
 4 GOOG   2020-01-07  69.9  70.1  69.5  69.7 30054000     69.4
 5 GOOG   2020-01-08  69.6  70.6  69.5  70.2 30560000     70.0
 6 GOOG   2020-01-09  71.0  71.4  70.5  71.0 30018000     70.7
 7 GOOG   2020-01-10  71.4  71.7  70.9  71.5 36414000     71.2
 8 GOOG   2020-01-13  71.8  72.0  71.3  72.0 33046000     71.7
 9 GOOG   2020-01-14  72.0  72.1  71.4  71.5 31178000     71.3
10 GOOG   2020-01-15  71.5  72.1  71.5  72.0 25654000     71.7
# ℹ 1,248 more rows

Another example of using a time series specific functionality is working with leads and lags: sometimes, we need to shift our variables by a specific interval, like getting the previous day’s price. Say, for example, that you want to understand how S&P returns levels relate to NFLX returns one-week ahead. It is relatively straightforward to do it with tidyquant:

Download S&P 500 and NFLX data using the tq_get() function
Use tq_transmute() to compute the weekly returns for each security based on daily data
Use tq_mutate() to generate a lagged series of S&P 500 returns

#Assuming you have the tidyverse and the tidyquant packages loadded

#Netflix Data
NFLX=tq_get('NFLX')%>%
  #Select only the necessary columns
  select(date,symbol,adjusted)%>%
  #Apply the weeklyReturn function and call the new column 'NFLX'
  tq_transmute(mutate_fun = weeklyReturn,
               col_rename = 'NFLX')

#S&P Data
SP500=tq_get('^GSPC')%>%
  #Select only the necessary columns
  select(date,symbol,adjusted)%>%
  #Apply the weeklyReturn function and call the new column 'SP500'
  tq_transmute(mutate_fun = weeklyReturn,
               col_rename = 'SP500')%>%
  #Apply the lag function for n=1 week and call the new column 'SP500'
  tq_transmute(mutate_fun = lag.xts,
            n=1,
            col_rename = 'SP500')%>%
  #Drop all rows with NA information (row 1, in this case)
  drop_na()

#Merge Data 
inner_join(NFLX,SP500)

Joining with `by = join_by(date)`

# A tibble: 530 × 3
   date           NFLX    SP500
   <date>        <dbl>    <dbl>
 1 2015-01-09 -0.0563   0      
 2 2015-01-16  0.0244  -0.00651
 3 2015-01-23  0.297   -0.0124 
 4 2015-01-30  0.00992  0.0160 
 5 2015-02-06  0.00579 -0.0277 
 6 2015-02-13  0.0489   0.0303 
 7 2015-02-20  0.0260   0.0202 
 8 2015-02-27 -0.00688  0.00635
 9 2015-03-06 -0.0438  -0.00275
10 2015-03-13 -0.0346  -0.0158 
# ℹ 520 more rows

Rolling functions

Finance practitioners are often asked to perform analysis on a rolling basis: we may want to calculate a given signal on day \(t\) based on past \(x\) periods of information. Say, for example, that you want to calculate a simple and exponential moving average of adjusted prices from 5 days back for a given stock. It is relatively straightforward to do it with tidyquant:

Download stock data using the tq_get() function
Use tq_mutate() twice along with the SMA() and EMA() functions setting n=5

#Assuming you have the tidyverse and the tidyquant packages loadded

#Set up the list of assets
assets=c('AMZN')

assets%>%
  tq_get()%>%
  select(date,symbol,adjusted)%>%
  group_by(symbol)%>%
  tq_mutate(adjusted, mutate_fun = SMA, n = 5)%>%
  tq_mutate(adjusted, mutate_fun = EMA, n = 5)

# A tibble: 2,551 × 5
# Groups:   symbol [1]
   symbol date       adjusted   SMA   EMA
   <chr>  <date>        <dbl> <dbl> <dbl>
 1 AMZN   2015-01-02     15.4  NA    NA  
 2 AMZN   2015-01-05     15.1  NA    NA  
 3 AMZN   2015-01-06     14.8  NA    NA  
 4 AMZN   2015-01-07     14.9  NA    NA  
 5 AMZN   2015-01-08     15.0  15.0  15.0
 6 AMZN   2015-01-09     14.8  14.9  15.0
 7 AMZN   2015-01-12     14.6  14.8  14.8
 8 AMZN   2015-01-13     14.7  14.8  14.8
 9 AMZN   2015-01-14     14.7  14.8  14.8
10 AMZN   2015-01-15     14.3  14.6  14.6
# ℹ 2,541 more rows

Lastly, financial analysts often cover a collection of securities on a rolling basis. For example, a buy-side analyst will monitor stocks from a given industry so as to understand which ones are overvalued, and which ones are undervalued. Say, for example, that you want to focus on a subset of 4 stocks, and you need to compare the cumulative return up to the latest closing price.

It is easy to integrate the tidyquant functions along with the group_by() function you’ve learned when working with dplyr:

Get the information using tq_get()
Group the data by symbol
Apply the tq_mutate and tq_transmute functions to pass time series functions to the data - in this case, the dailyReturn() and the Return.cumulative() function

#Assuming you have the tidyverse and the tidyquant packages loadded

#Set up the list of assets
assets=c('AMZN','GOOG','META','GME')

assets%>%
  tq_get()%>%
  select(date,symbol,adjusted)%>%
  group_by(symbol)%>%
  tq_mutate(adjusted, mutate_fun = dailyReturn,col_rename = 'daily_return')%>%
  tq_transmute(daily_return,mutate_fun = Return.cumulative)%>%
  mutate(across(where(is.numeric),percent,big.mark='.'))%>%
  setNames(c('Ticker','Cumulative Return up-to-date'))

Warning: There were 5 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(where(is.numeric), percent, big.mark = ".")`.
ℹ In group 1: `symbol = "AMZN"`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.

  # Previously
  across(a:b, mean, na.rm = TRUE)

  # Now
  across(a:b, \(x) mean(x, na.rm = TRUE))
ℹ Run `dplyr::last_dplyr_warnings()` to see the 4 remaining warnings.

# A tibble: 4 × 2
# Groups:   Ticker [4]
  Ticker `Cumulative Return up-to-date`
  <chr>  <chr>                         
1 AMZN   1.279%                        
2 GOOG   595%                          
3 META   755%                          
4 GME    298%

Hands-on Exercise

Your manager (who did not lift any weights past the last 5 years) wanted to replicate the returns of the Deadlift ETF from 2020 to 2024. You job is to create a simple table of yearly returns comparing the Deadlift ETF vis-a-vis the S&P 500 Index. Follow the instructions and answer to the following questions:

When looking at the yearly results from both the Deadlift ETF and S&P 500, which one did perform better?
What are the potential explanations for the result you have found?

To answer to these questions, you will be using the a combination of dplyr and tidyquant functions you have learned so far. The expected result is a data.frame object that shows both the Deadlift ETF as well as the S&P 500 returns (columns) on a yearly basis (rows).

Instructions

Before you start, make sure to have the tidyverse and the tidyquant packages loaded in your session. Following the instructions from the previous lectures, you can either make a direct call to each package, library(tidyverse) and library(tidyquant), or copy-paste the script from the course’s official website.

Use tq_get() to load information from the S&P Index and the Deadlift ETF constituents in two separate objects. You can use the code ^GSPC to retrieve information for the index, and you can pass a vector c('ticker1','ticker2',...,'ticker_n') to get information on the Deadlift ETF constituents
Filter for observations starting between 2020 (beginning of) and 2024 (end of) using the from and to arguments of the tq_get() function
Group the Deadlift ETF data by symbol using the group_by() function
For both data sets, create a yearly_ret variable that calculates the yearly return of a given security. You can use the tq_transmute() function, passing the yearlyReturn() function along the chain
For the Deadlift data set, regroup the data by date and calculate the Deadlift returns using a mutate() function (Hint: it is an equally weighted portfolio)
Merge both datasets using inner_join()

Solution walkthrough

#Assuming you have the tidyverse and the tidyquant packages loadded

# Set up the list of assets
deadlift=c('META','AMZN','GS','UBER','MSFT','AAPL','BLK','NVDA')

#Set up the starting date
start='2020-01-01'
end='2024-12-31'

#Step 1: Read the Deadlift data using tidyquant
Deadlift_Performance=deadlift%>%
  tq_get(from=start,to=end)%>%
  #Select only the columns of interest
  select(symbol,date,adjusted)%>%
  #Group by symbol and date
  group_by(symbol)%>%
  #Use tq_transmute to aggregate and calculate weekly returns
  tq_transmute(selected=adjusted,
               mutate_fun=yearlyReturn,
               col_rename = 'Deadlift')%>%
  #Group by date
  group_by(date)%>%
  #Summarize average return (since it is an equally-weighted portfolio)
  summarize(Deadlift=mean(Deadlift,na.rm=TRUE))

#Step 2: Read the S&P 500 data using tidyquant
SP500_Performance=tq_get('^GSPC',from=start,to=end)%>%
  #Select only the columns of interest
  select(symbol,date,adjusted)%>%
  #Group by symbol and date
  group_by(symbol)%>%
  #Use tq_transmute to aggregate and calculate weekly returns
  tq_transmute(selected=adjusted,
               mutate_fun=yearlyReturn,
               col_rename = 'SP500')%>%
  ungroup()%>%
  select(-symbol)
    
#Merge
SP500_Performance%>%
  inner_join(Deadlift_Performance)%>%
  mutate(across(where(is.numeric),percent))%>%
  mutate(date=year(date))%>%
  setNames(c('Year','S&P 500','DeadLift ETF'))

Joining with `by = join_by(date)`

# A tibble: 5 × 3
   Year `S&P 500` `DeadLift ETF`
  <dbl> <chr>     <chr>         
1  2020 15.29%    57.9%         
2  2021 26.89%    37.2%         
3  2022 -19.44%   -36.0%        
4  2023 24.23%    100.5%        
5  2024 23.84%    52.1%

This solution uses tidyquant and tidyverse to analyze the yearly returns of a custom portfolio (i.e., the “Deadlift ETF”) consisting of eight stocks and compares it with the S&P 500.

Define Assets and Time Range. first, we dedfine a custom portfolio (deadlift) containing eight stocks, and set the start and end dates for data collection.
Fetch & Process the Deadlift Portfolio Returns. Starting with the Deadlift ETF, we first fetch historical stock prices using tq_get() for each asset contained in the ETF. After that, we only keep the relevant columns - namely, symbol, date, and adjusted. Using group_by() to group by symbol, we use the tq_transmute() function to apply the yearlyReturns function to the adjusted column, renaming it as Deadlift.
Calculating portfolio returns. Since this ETF is an equally-weighted portfolio, and using the fact that a portfolio return is a weighted average of the individual securities, you can safely use the mean() function to calculate the return of such portfolio. To make sure that the calculation is performed for each year, use group_by() again, grouping by date.

Try doing some edits on your own!

Try thinking about changes you could do to either improve code readability of the analysis. A couple of edits that can be made include, but are not limited, to:

Adding more time periods to the analysis
Contrasting the DeadLift ETF with the S&P 500 in terms of variance and Sharpe Ratio
Doing the comparison using value-weighted returns (i.e, weighting the securities inside the portfolio according to its market capitalization) or inverse volatility (i.e, riskier assets have lower weights)

Play around with these concepts to get familiar with all the data manipulation tools that come with the tidyquant package!

About this Document

Loading packages

Working with Time Series

Core features of xts

Introducing the tidyquant package

The tq_get() function

The tq_mutate() function

The tq_transmute() function

Working with time series objects

Rolling functions

Hands-on Exercise

Solution walkthrough

Try doing some edits on your own!

Core features of `xts`

Introducing the `tidyquant` package

The `tq_get()` function

The `tq_mutate()` function

The `tq_transmute()` function