# Package names
<- c("tidyverse","tidyquant","tidymodels", "glue","scales","ggthemes")
packages
# Install packages not yet installed
<- packages %in% rownames(installed.packages())
installed_packages
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])
}
# Load all packages
invisible(lapply(packages, library, character.only = TRUE))
Manipulating Time Series
About this Document
This file replicates the codes that have been discussed in the live-session lectures of the Practical Applications in Quantitative Finance course. To ensure you can run the codes without issues, please install and load all required packages beforehand. It is always a good practice to replicate this Quarto document and experiment by making edits to the parameters. At the end of this report, you will find a suggestion on how to tweak this report — try doing some changes on your own!
In this lecture, we will be working with daily stock price data from several stocks included in the S&P 500 index. Instead of loading the data from a .csv
file, we will pull the data directly from R using the tidyquant
package. Before you start, make sure to follow the instructions from our previous replication to set up your working directory correctly.
Loading packages
As we get started, we will be loading all packages referred in our official website.
Note that you could easily get around this by installing and loading all necessary packages using a more simple syntax:
#Install if not already available - I have commented these lines so that R does not attempt to install it everytime
#install.packages('tidyverse')
#install.packages('tidyquant')
#install.packages('glue')
#install.packages('scales')
#install.packages('ggthemes')
#Load
library(tidyverse)
library(tidyquant)
library(tidymodels)
library(glue)
library(scales)
library(ggthemes)
Working with Time Series
In the previous lectures, you worked your way through the exercises by using the amazing dplyr
functionalities on data.frames
. In some cases, however, you had to do some workarounds with drop_na()
, slice_tail()
and lag()
simply because you were manipulating time series data.
The xts
, which stantds for eXtensible Time Series is an R
package that is is widely used for handling and manipulating time series data. It extends the functionality of the zoo package by providing a structured framework for managing time-indexed data efficiently. Such package provides a matrix-like structure where each row is indexed by a date-time value, allowing for efficient subsetting, merging, and manipulation of time series data. It is especially useful in financial applications where time-stamped data is common.
library(xts) # Note that this is loaded together with the tidyquant package
# Create a vector of random values
<- rnorm(10)
data_values
# Create a sequence of dates
<- as.Date("2024-01-01") + 0:9
dates
# Convert to xts
<- xts(data_values, order.by = dates)
xts_data
# Print the xts object
print(xts_data)
[,1]
2024-01-01 -0.8782227
2024-01-02 -0.4597098
2024-01-03 -0.3964338
2024-01-04 -1.0016150
2024-01-05 -1.5862849
2024-01-06 -0.1767326
2024-01-07 -0.9713712
2024-01-08 0.3229833
2024-01-09 -1.7203924
2024-01-10 -0.7158226
The output shows an interesting feature of an xts
format in R:
- The first column contains the values
- The row names are timestamps
- The
xts
object retains an efficient internal structure
Core features of xts
- Time-Based Indexing & Subsetting: you can subset an
xts
object using time-based indexing in a variety of ways. If you were to do this in adata.frame
, R wouldn’t be able to retrieve the data, as adata.frame
does not carry any time series properties that are needed for the job:
# Subset a specific date
"2024-01-03"] xts_data[
[,1]
2024-01-03 -0.3964338
# Subset a range
"2024-01-03/2024-01-06"] xts_data[
[,1]
2024-01-03 -0.3964338
2024-01-04 -1.0016150
2024-01-05 -1.5862849
2024-01-06 -0.1767326
# Subset a custom Y-M-D definition
"2024"] # Returns all data from 2024 xts_data[
[,1]
2024-01-01 -0.8782227
2024-01-02 -0.4597098
2024-01-03 -0.3964338
2024-01-04 -1.0016150
2024-01-05 -1.5862849
2024-01-06 -0.1767326
2024-01-07 -0.9713712
2024-01-08 0.3229833
2024-01-09 -1.7203924
2024-01-10 -0.7158226
"2024-01"] # Returns all data from January 2024 xts_data[
[,1]
2024-01-01 -0.8782227
2024-01-02 -0.4597098
2024-01-03 -0.3964338
2024-01-04 -1.0016150
2024-01-05 -1.5862849
2024-01-06 -0.1767326
2024-01-07 -0.9713712
2024-01-08 0.3229833
2024-01-09 -1.7203924
2024-01-10 -0.7158226
- Merging and Combining Time Series: you can merge two
xts
objects using by the timestamp component that is embedded in thexts
structure:
<- xts(rnorm(5), order.by = as.Date("2024-01-01") + 0:4)
data1 <- xts(rnorm(5), order.by = as.Date("2024-01-01") + 2:6)
data2
<- merge(data1, data2, join = "outer")
merged_data print(merged_data)
data1 data2
2024-01-01 -0.3247350 NA
2024-01-02 -0.4150009 NA
2024-01-03 1.4498303 -1.47773093
2024-01-04 1.6398024 0.41617149
2024-01-05 -0.9977372 -0.06436564
2024-01-06 NA 0.81567706
2024-01-07 NA 1.17189255
join = "outer"
ensures all time points are includedIf a time point is missing in one dataset, it is filled with
NA
Merging and Combining Time Series: using functions that retrieve leads and lags of a given variable are a key component of
xts
objects. Due to its timestamp component, one can shift variables backwards (using thelag()
function) as well as forward (using thelag()
) function:
<- lag(data1) # Shift values by 1 day
lagged_data print(lagged_data)
[,1]
2024-01-01 NA
2024-01-02 -0.3247350
2024-01-03 -0.4150009
2024-01-04 1.4498303
2024-01-05 1.6398024
<- merge(lagged_data, data1, join = "outer")
merged_data print(merged_data)
lagged_data data1
2024-01-01 NA -0.3247350
2024-01-02 -0.3247350 -0.4150009
2024-01-03 -0.4150009 1.4498303
2024-01-04 1.4498303 1.6398024
2024-01-05 1.6398024 -0.9977372
All in all, when it comes to working with financial data, xts
objects are particularly useful for:
- Handling stock price data from
tidyquant
- Calculating log returns for portfolio management
- Aligning time series data (e.g., joining different financial datasets)
- Aggregating financial data (e.g., monthly returns)
- Subsetting data by years/months/days
- Calculating rolling functions (e.g, yearly averages)
- Aggregating data at different intervals (e.g, convert daily to weekly prices)
Unfortunately, there is an issue: the tidyverse
is not fully designed to work with time series classes, such as xts
. Since xts
is optimized for time series operations, some functions that would work well when managing time series are not easily translated using the packages from the tidyverse
:
tidyverse
: Designed for tabular (data frame-like) structures, emphasizing “tidy” data (each row is an observation, each column is a variable)xts
: Designed for time series, optimizing time-based indexing and calculations but less compatible with tidyverse workflows
Introducing the tidyquant
package
The tidyquant
package (see official documentation here) helps integrate both paradigms, making financial analysis more intuitive in R. It integrates several financial packages, like zoo
, xts
, quantmod
, TTR
, and PerformanceAnalytics
, with the tidy data infrastructure of the tidyverse
, allowing for seamless interaction between tidyverse
data manipulation and financial functions.
There are mainly three functions in the tidyquant
package that we will be using throughout this lecture: tq_get()
, tq_mutate()
, and tq_transmute()
.
The tq_get()
function
The tq_transmute()
returns only newly created columns and is typically used when periodicity changes. Its syntax is the following:
The tq_get()
function is a powerful tool for retrieving financial data in a tidy format. It provides an easy way to import stock prices, economic data, and key financial metrics from multiple sources, including Yahoo Finance, FRED, Alpha Vantage, and Quandl.
tq_get(x, get = "stock.prices", from = NULL, to = NULL, ...)
x
: a character string or vector specifying the stock symbol(s) or identifier(s)get
: the type of data to retrieve. Defaults to"stock.prices"
from
: start date, in YYYY-MM-DD format. Defaults toNULL
(gets all available data)to
: end date, in YYYY-MM-DD format. Defaults toNULL
(gets all available data)...
: additional arguments specific to the data source
One of the most common uses of tq_get()
is fetching stock price data. Say, for example, that you want to fetch data from Apple between January and February for 2024. It is easy to use tq_get()
to retrieve such information:
# Assuming you have the tidyquant loaded in your session
# Fetch Apple (AAPL) stock data from 2024-01-01 to 2024-02-01
<- tq_get("AAPL", from = "2024-01-01", to = "2024-02-01")
aapl_data
# Print first few rows
head(aapl_data)
# A tibble: 6 × 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2024-01-02 187. 188. 184. 186. 82488700 185.
2 AAPL 2024-01-03 184. 186. 183. 184. 58414500 183.
3 AAPL 2024-01-04 182. 183. 181. 182. 71983600 181.
4 AAPL 2024-01-05 182. 183. 180. 181. 62303300 180.
5 AAPL 2024-01-08 182. 186. 182. 186. 59144500 184.
6 AAPL 2024-01-09 184. 185. 183. 185. 42841800 184.
- The result is a tidy
tibble
, unlike quantmod’sxts
format.
The tq_mutate()
function
The tq_mutate()
function adds adds new variables to an existing tibble
:
tq_mutate(.data, #The object you are performing the calculations
#The columns to send to the mutation function
selected_variables, #The mutation function from either the xts, quantmod, or TTR package.
mutate_fun, #A string or character vector containing names that can be used to quickly rename columns
col_rename )
- The main advantage is the results are returned as a
tibble
and the function can be used with thetidyverse
- It is used when you expected additional columns to be added to the resulting data frame
- You can use several time series related functions from other
R
packages - calltq_mutate_fun_options()
to see the list of available options - All in all, it is similar in spirit to
mutate()
The tq_transmute()
function
The tq_transmute()
returns only newly created columns and is typically used when periodicity changes. Its syntax is the following:
tq_mutate(.data, #The object you are performing the calculations
#The columns to send to the mutation function
selected_variables, #The mutation function from either the xts, quantmod, or TTR package.
mutate_fun, #A string or character vector containing names that can be used to quickly rename columns
col_rename )
tq_transmute()
works exactly liketq_mutate()
except it only returns the newly created columns- This is helpful when changing periodicity where the new columns would not have the same number of rows as the original tibble
- All in all, it is similar in spirit to
summarize()
Working with time series objects
An immediate useful example of using a time series specific functionality with a tidyverse logic relates to filtering. Sometimes, we may be interested in getting only a subset of the data (for example, only GOOG information). Furthermore, we may be interested in subsetting only a specific time frame for our analysis
It is relatively straightforward to do it with tidyquant
:
- Use
filter()
to select only rows wheresymbol=='GOOG'
- In the same call, filter for
date>= min_date
anddate<=max_date
#Assuming you have the tidyverse and the tidyquant packages loadded
#Set up the list of assets
=c('AMZN','GOOG','META','GME')
assets
#Filter out
%>%
assetstq_get()%>%
filter(symbol=='GOOG',
>='2020-01-01',
date<='2024-12-31') date
# A tibble: 1,258 × 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 GOOG 2020-01-02 67.1 68.4 67.1 68.4 28132000 68.1
2 GOOG 2020-01-03 67.4 68.6 67.3 68.0 23728000 67.8
3 GOOG 2020-01-06 67.5 69.8 67.5 69.7 34646000 69.5
4 GOOG 2020-01-07 69.9 70.1 69.5 69.7 30054000 69.4
5 GOOG 2020-01-08 69.6 70.6 69.5 70.2 30560000 70.0
6 GOOG 2020-01-09 71.0 71.4 70.5 71.0 30018000 70.7
7 GOOG 2020-01-10 71.4 71.7 70.9 71.5 36414000 71.2
8 GOOG 2020-01-13 71.8 72.0 71.3 72.0 33046000 71.7
9 GOOG 2020-01-14 72.0 72.1 71.4 71.5 31178000 71.3
10 GOOG 2020-01-15 71.5 72.1 71.5 72.0 25654000 71.7
# ℹ 1,248 more rows
Another example of using a time series specific functionality is working with leads and lags: sometimes, we need to shift our variables by a specific interval, like getting the previous day’s price. Say, for example, that you want to understand how S&P returns levels relate to NFLX returns one-week ahead. It is relatively straightforward to do it with tidyquant
:
- Download S&P 500 and NFLX data using the
tq_get()
function - Use
tq_transmute()
to compute the weekly returns for each security based on daily data - Use
tq_mutate()
to generate a lagged series of S&P 500 returns
#Assuming you have the tidyverse and the tidyquant packages loadded
#Netflix Data
=tq_get('NFLX')%>%
NFLX#Select only the necessary columns
select(date,symbol,adjusted)%>%
#Apply the weeklyReturn function and call the new column 'NFLX'
tq_transmute(mutate_fun = weeklyReturn,
col_rename = 'NFLX')
#S&P Data
=tq_get('^GSPC')%>%
SP500#Select only the necessary columns
select(date,symbol,adjusted)%>%
#Apply the weeklyReturn function and call the new column 'SP500'
tq_transmute(mutate_fun = weeklyReturn,
col_rename = 'SP500')%>%
#Apply the lag function for n=1 week and call the new column 'SP500'
tq_transmute(mutate_fun = lag.xts,
n=1,
col_rename = 'SP500')%>%
#Drop all rows with NA information (row 1, in this case)
drop_na()
#Merge Data
inner_join(NFLX,SP500)
Joining with `by = join_by(date)`
# A tibble: 530 × 3
date NFLX SP500
<date> <dbl> <dbl>
1 2015-01-09 -0.0563 0
2 2015-01-16 0.0244 -0.00651
3 2015-01-23 0.297 -0.0124
4 2015-01-30 0.00992 0.0160
5 2015-02-06 0.00579 -0.0277
6 2015-02-13 0.0489 0.0303
7 2015-02-20 0.0260 0.0202
8 2015-02-27 -0.00688 0.00635
9 2015-03-06 -0.0438 -0.00275
10 2015-03-13 -0.0346 -0.0158
# ℹ 520 more rows
Rolling functions
Finance practitioners are often asked to perform analysis on a rolling basis: we may want to calculate a given signal on day \(t\) based on past \(x\) periods of information. Say, for example, that you want to calculate a simple and exponential moving average of adjusted prices from 5 days back for a given stock. It is relatively straightforward to do it with tidyquant
:
- Download stock data using the
tq_get()
function - Use
tq_mutate()
twice along with theSMA()
andEMA()
functions settingn=5
#Assuming you have the tidyverse and the tidyquant packages loadded
#Set up the list of assets
=c('AMZN')
assets
%>%
assetstq_get()%>%
select(date,symbol,adjusted)%>%
group_by(symbol)%>%
tq_mutate(adjusted, mutate_fun = SMA, n = 5)%>%
tq_mutate(adjusted, mutate_fun = EMA, n = 5)
# A tibble: 2,551 × 5
# Groups: symbol [1]
symbol date adjusted SMA EMA
<chr> <date> <dbl> <dbl> <dbl>
1 AMZN 2015-01-02 15.4 NA NA
2 AMZN 2015-01-05 15.1 NA NA
3 AMZN 2015-01-06 14.8 NA NA
4 AMZN 2015-01-07 14.9 NA NA
5 AMZN 2015-01-08 15.0 15.0 15.0
6 AMZN 2015-01-09 14.8 14.9 15.0
7 AMZN 2015-01-12 14.6 14.8 14.8
8 AMZN 2015-01-13 14.7 14.8 14.8
9 AMZN 2015-01-14 14.7 14.8 14.8
10 AMZN 2015-01-15 14.3 14.6 14.6
# ℹ 2,541 more rows
Lastly, financial analysts often cover a collection of securities on a rolling basis. For example, a buy-side analyst will monitor stocks from a given industry so as to understand which ones are overvalued, and which ones are undervalued. Say, for example, that you want to focus on a subset of 4 stocks, and you need to compare the cumulative return up to the latest closing price.
It is easy to integrate the tidyquant
functions along with the group_by()
function you’ve learned when working with dplyr
:
- Get the information using
tq_get()
- Group the data by
symbol
- Apply the
tq_mutate
andtq_transmute
functions to pass time series functions to the data - in this case, thedailyReturn()
and theReturn.cumulative()
function
#Assuming you have the tidyverse and the tidyquant packages loadded
#Set up the list of assets
=c('AMZN','GOOG','META','GME')
assets
%>%
assetstq_get()%>%
select(date,symbol,adjusted)%>%
group_by(symbol)%>%
tq_mutate(adjusted, mutate_fun = dailyReturn,col_rename = 'daily_return')%>%
tq_transmute(daily_return,mutate_fun = Return.cumulative)%>%
mutate(across(where(is.numeric),percent,big.mark='.'))%>%
setNames(c('Ticker','Cumulative Return up-to-date'))
Warning: There were 5 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(where(is.numeric), percent, big.mark = ".")`.
ℹ In group 1: `symbol = "AMZN"`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.
# Previously
across(a:b, mean, na.rm = TRUE)
# Now
across(a:b, \(x) mean(x, na.rm = TRUE))
ℹ Run `dplyr::last_dplyr_warnings()` to see the 4 remaining warnings.
# A tibble: 4 × 2
# Groups: Ticker [4]
Ticker `Cumulative Return up-to-date`
<chr> <chr>
1 AMZN 1.279%
2 GOOG 595%
3 META 755%
4 GME 298%
Hands-on Exercise
Your manager (who did not lift any weights past the last 5 years) wanted to replicate the returns of the Deadlift ETF from 2020 to 2024. You job is to create a simple table of yearly returns comparing the Deadlift ETF vis-a-vis the S&P 500 Index. Follow the instructions and answer to the following questions:
- When looking at the yearly results from both the Deadlift ETF and S&P 500, which one did perform better?
- What are the potential explanations for the result you have found?
To answer to these questions, you will be using the a combination of dplyr
and tidyquant
functions you have learned so far. The expected result is a data.frame
object that shows both the Deadlift ETF as well as the S&P 500 returns (columns) on a yearly basis (rows).
Before you start, make sure to have the tidyverse
and the tidyquant
packages loaded in your session. Following the instructions from the previous lectures, you can either make a direct call to each package, library(tidyverse)
and library(tidyquant)
, or copy-paste the script from the course’s official website.
- Use
tq_get()
to load information from the S&P Index and the Deadlift ETF constituents in two separate objects. You can use the code^GSPC
to retrieve information for the index, and you can pass a vectorc('ticker1','ticker2',...,'ticker_n')
to get information on the Deadlift ETF constituents - Filter for observations starting between 2020 (beginning of) and 2024 (end of) using the
from
andto
arguments of thetq_get()
function - Group the Deadlift ETF data by
symbol
using thegroup_by()
function - For both data sets, create a
yearly_ret
variable that calculates the yearly return of a given security. You can use thetq_transmute()
function, passing theyearlyReturn()
function along the chain - For the Deadlift data set, regroup the data by
date
and calculate the Deadlift returns using amutate()
function (Hint: it is an equally weighted portfolio) - Merge both datasets using
inner_join()
Solution walkthrough
#Assuming you have the tidyverse and the tidyquant packages loadded
# Set up the list of assets
=c('META','AMZN','GS','UBER','MSFT','AAPL','BLK','NVDA')
deadlift
#Set up the starting date
='2020-01-01'
start='2024-12-31'
end
#Step 1: Read the Deadlift data using tidyquant
=deadlift%>%
Deadlift_Performancetq_get(from=start,to=end)%>%
#Select only the columns of interest
select(symbol,date,adjusted)%>%
#Group by symbol and date
group_by(symbol)%>%
#Use tq_transmute to aggregate and calculate weekly returns
tq_transmute(selected=adjusted,
mutate_fun=yearlyReturn,
col_rename = 'Deadlift')%>%
#Group by date
group_by(date)%>%
#Summarize average return (since it is an equally-weighted portfolio)
summarize(Deadlift=mean(Deadlift,na.rm=TRUE))
#Step 2: Read the S&P 500 data using tidyquant
=tq_get('^GSPC',from=start,to=end)%>%
SP500_Performance#Select only the columns of interest
select(symbol,date,adjusted)%>%
#Group by symbol and date
group_by(symbol)%>%
#Use tq_transmute to aggregate and calculate weekly returns
tq_transmute(selected=adjusted,
mutate_fun=yearlyReturn,
col_rename = 'SP500')%>%
ungroup()%>%
select(-symbol)
#Merge
%>%
SP500_Performanceinner_join(Deadlift_Performance)%>%
mutate(across(where(is.numeric),percent))%>%
mutate(date=year(date))%>%
setNames(c('Year','S&P 500','DeadLift ETF'))
Joining with `by = join_by(date)`
# A tibble: 5 × 3
Year `S&P 500` `DeadLift ETF`
<dbl> <chr> <chr>
1 2020 15.29% 57.9%
2 2021 26.89% 37.2%
3 2022 -19.44% -36.0%
4 2023 24.23% 100.5%
5 2024 23.84% 52.1%
This solution uses tidyquant
and tidyverse
to analyze the yearly returns of a custom portfolio (i.e., the “Deadlift ETF”) consisting of eight stocks and compares it with the S&P 500.
Define Assets and Time Range. first, we dedfine a custom portfolio (deadlift) containing eight stocks, and set the start and end dates for data collection.
Fetch & Process the Deadlift Portfolio Returns. Starting with the Deadlift ETF, we first fetch historical stock prices using
tq_get()
for each asset contained in the ETF. After that, we only keep the relevant columns - namely,symbol
,date
, andadjusted
. Usinggroup_by()
to group bysymbol
, we use thetq_transmute()
function to apply theyearlyReturns
function to theadjusted
column, renaming it asDeadlift
.Calculating portfolio returns. Since this ETF is an equally-weighted portfolio, and using the fact that a portfolio return is a weighted average of the individual securities, you can safely use the
mean()
function to calculate the return of such portfolio. To make sure that the calculation is performed for each year, usegroup_by()
again, grouping bydate
.
Try doing some edits on your own!
Try thinking about changes you could do to either improve code readability of the analysis. A couple of edits that can be made include, but are not limited, to:
- Adding more time periods to the analysis
- Contrasting the DeadLift ETF with the S&P 500 in terms of variance and Sharpe Ratio
- Doing the comparison using value-weighted returns (i.e, weighting the securities inside the portfolio according to its market capitalization) or inverse volatility (i.e, riskier assets have lower weights)
Play around with these concepts to get familiar with all the data manipulation tools that come with the tidyquant
package!