Replicating CRSP Volatility Decile Portfolios in R

Posted on December 7, 2016 by Erol Biceroglu in R bloggers | 0 Comments

[This article was first published on R – Propfolio Management, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

In this post, I provide R code that enables the replication of the Center for Research in Security Prices (CRSP) Volatiliy Deciles using Yahoo! Finance data. This post is related to my last blog post in that it will generate the CRSP low volatility decile portfolio, thereby facilitating the replication of the associated EMA trading strategy.

There are a few caveats to this replication:

There will be survivorship bias since Yahoo! Finance does not contain de-listed stocks
This code takes a long time to run (+2 hrs on my machine, excluding downloading the symbols), and so it should not all be executed at once
The parallel processing libraries in this code run in linux
- For Windows, “library(snow)”” should be used instead of “library(parallel)” and “library(doSNOW)” should be used instead of “library(doParallel)”
- A SOCK cluster will likely need to be used instead of a FORK cluster
- Consequently, objects, functions and libraries will have to explicitly be exported to the cores, thereby consuming more RAM
There are +5,000 securities to download from Yahoo! Finance, and any one of them can fail randomly, thereby yielding different results each time the code is executed
- It is up to the individual user to decide if they want to download the failed symbols
The quality control [QC] for this code is not exhaustive, i.e. my goal was to quickly get data for a time-sensitive paper-replication project, thus there is potential for more QC, which I will indicate in the post

This code is related to a paper replication project for a course I took over the summer of 2016 (Advanced Trading System Design). I will be sharing the project in a subsequent post.

To run the following script, one of my packages, titled tsconv (which stands for “Time Series Convenience”) needs to be installed. The following line of code will install the package, provided that the devtools library is already installed.

devtools::install_github("erolbicero/tsconv")

Download Security Tickers

In order to determine the universe of stocks to include, a list of security tickers that trade on the NYSE and NASDAQ is required. This was located by reading the relevant post by Louis Marascio, and as per the discussion, the list of stock tickers that trade on the NYSE and AMEX was then obtained from http://www.nasdaqtrader.com/trader.aspx?id=symboldirdefs on July 8th, 2016.

The file format is a pipe-delimited text file. It contains preferred shares and ETFs, which should be excluded from the replication. Therefore, the file was imported into Microsoft Excel, and then filtered for tickers that were not ETFs, and where the “ACT Symbol” excluded the “$” character, as these were predominantly preferred shares. Lastly, the exchange needed to be part of the NYSE, or NASDAQ, and the only exchange code listed on NASDAQ which is not a part of these exchanges is “Z”, which is the label for BATS Global Markets; therefore any stock with an exchange code “Z” was excluded. After these exclusions, 6,157 ticker symbols remained in the file. The symbols from this cleaned list were exported to a Comma-Separated-Value [CSV] file.

For convenience, I am providing a clean CSV file at this link containing all the tickers I used from the July 8th, 2016 extract.

Load the Tickers and Download the Data

Next, the tickers are loaded into the global environment, and then downloaded one by one from Yahoo! Finance to avoid getting blocked by the website. It may be possible to download more symbols at a time, thereby speeding up the process, however I did not test that to ensure I was not cut off. These are stored in a separate environment titled “stockPriceEnv”. I have added code to save and load the downloaded tickers, since it is time-consuming, taking almost 2 hrs on my machine.

library(quantmod)
library(xts)
library(PerformanceAnalytics)
library(tsconv)
library(stringr)
library(foreach)
library(doParallel)
library(timeDate)

stockSymbols <- read.csv(file = paste0(getwd(),"/StockList2016-07-08.csv"), stringsAsFactors = FALSE, header = FALSE)
stockSymbols <- as.vector(stockSymbols[,1])

stockPriceEnv <- new.env()

successSymbols <- NULL
failSymbols <- NULL

#Use tryCatch since some symbols will cause the function to fail
#Also run painfully slowly so you don't get cut off from Yahoo!
startTime <- Sys.time()
for(stock in  stockSymbols)
{
  tryCatch(expr = {
              getSymbols.yahoo(Symbols = stock
                                , env = stockPriceEnv
                                , from = "1963-07-01" # July 1st, 1963
                                , to = "2016-07-11" # July 11th, 2016

                              )
              successSymbols <- c(successSymbols, stock)

              print(paste(stock, "Succeeds"))

                  }
           , error = function(err){
             print(paste(stock, "Fails"))
             failSymbols <<- c(failSymbols, stock)
           }
  )
}
endTime <- Sys.time()
endTime - startTime
# Time difference of 1.93681 hours

#list all stocks
ls(envir = stockPriceEnv)

#save the downloaded tickers
saveRDS(stockPriceEnv, file = "stockPriceEnv-2016-12-01.RData")

################################################################
# Use this commented code to load stock symbols
#stockPriceEnv <- readRDS("stockPriceEnv-2016-12-01.RData")
#failSymbols <- stockSymbols[!(stockSymbols %in% ls(envir = stockPriceEnv))]
################################################################

The following symbols failed during the download. This is important, since the failed symbols appear to change each time this code is run, which ultimately yields slightly different volatility decile performance. This means that to replicate the numbers in this post, the exact same symbols will need to fail/be present.

Failed Download Symbols

##   [1] "AA"     "AGM.A"  "AKO.A"  "AKO.B"  "APHB"   "ARP"    "ASB.W"
##   [8] "AST.W"  "BAC.A"  "BAC.B"  "BF.A"   "BF.B"   "BGX"    "BIO.B"
##  [15] "BIOA.W" "BNK"    "BRK.A"  "BRK.B"  "BTX.W"  "BWL.A"  "C.A"
##  [22] "CBO"    "CJES"   "CMA.W"  "COF.W"  "CRD.A"  "CRD.B"  "CVM.W"
##  [29] "DEG"    "DWRE"   "DXI"    "DYN.W"  "FCE.A"  "FCE.B"  "FPP.W"
##  [36] "FUR"    "GEF.B"  "GM.B"   "GSI"    "GTN.A"  "HBM.W"  "HEI.A"
##  [43] "HIVE"   "HLS.W"  "HTS"    "HVT.A"  "IBO"    "IHS"    "JW.A"
##  [50] "JW.B"   "KEG"    "KKD"    "KODK.A" "KODK.W" "LEN.B"  "LNC.W"
##  [57] "MKC.V"  "MOG.A"  "MOG.B"  "MSK"    "MTB.W"  "NPD"    "ONE"
##  [64] "PBR.A"  "PVCT.W" "QIHU"   "RDS.A"  "RDS.B"  "SBNB"   "SCQ"
##  [71] "STI.A"  "STI.B"  "STZ.B"  "TAL"    "TCB.W"  "TI.A"   "TUMI"
##  [78] "VALE.P" "VLY.W"  "WSO.B"  "ABEOW"  "ADXSW"  "AGFSW"  "ANDAR"
##  [85] "ANDAU"  "ANDAW"  "ARWAR"  "ARWAU"  "ARWAW"  "AUMAW"  "BBCN"
##  [92] "BHACR"  "BHACU"  "BHACW"  "BIND"   "BLVDW"  "BNTCW"  "BVXVW"
##  [99] "CADT"   "CADTW"  "CATYW"  "CERCW"  "CERCZ"  "CERE"   "CHEKW"
## [106] "CLACW"  "CLRBZ"  "CNYD"   "COWNL"  "COYNW"  "CPXX"   "CYHHZ"
## [113] "CYRXW"  "DRIOW"  "DRWIW"  "EACQU"  "EACQW"  "EAGLW"  "ECACR"
## [120] "ELECU"  "ELECW"  "EPRS"   "EYEGW"  "FNFG"   "FNTC"   "FNTCU"
## [127] "FNTCW"  "GFNSL"  "GGAC"   "GGACR"  "GIGA"   "GPACU"  "GPACW"
## [134] "GPIAW"  "GRSHW"  "HCACU"  "HCACW"  "HCAPL"  "HDRAR"  "HDRAU"
## [141] "HDRAW"  "HMPR"   "HNSN"   "HRMNW"  "IMOS"   "IRCP"   "JASNW"
## [148] "JSYNR"  "JSYNW"  "KLREW"  "KTOVW"  "KUTV"   "LBTYA"  "LDRH"
## [155] "LINDW"  "LMFAW"  "LPSB"   "MDVXW"  "MFLX"   "MIIIU"  "MRKT"
## [162] "NUROW"  "NXEOW"  "NXTDW"  "OACQR"  "OACQU"  "OACQW"  "OCLSW"
## [169] "ONSIW"  "ONSIZ"  "OPGNW"  "OXBRW"  "PAACR"  "PAACW"  "PACEW"
## [176] "PAVMU"  "PBIP"   "QDEL"   "QPACU"  "QPACW"  "RNVAW"  "SHLDW"
## [183] "SNFCA"  "SNHNL"  "SNI"    "SNTA"   "SOHOM"  "SQI"    "SRAQW"
## [190] "SRTSU"  "TACOW"  "TALL"   "TFSC"   "TFSCR"  "TFSCU"  "TFSCW"
## [197] "TRTLW"  "VKTXW"  "WIBC"   "WVVIP"  "WYIGW"  "ZAZZT"  "ZBZZT"
## [204] "ZCZZT"  "ZIONZ"  "ZJZZT"  "ZNWAA"  "ZVZZC"  "ZVZZT"  "ZWZZT"
## [211] "ZXYZ.A" "ZXZZT"

Quality Control

Here, there are 2 known issues I have come across while working on this code. One, is that there’s a symbol with the name “TRUE”, which not surprisingly, causes issues. The fix is to rename it to “TRUE.”.

The second issue is that the ticker OHGI appears to have “garbage” data, thus it is excluded completely from the replication.

OHGI Adjusted Closing Prices

It is possible that there are more anomalies within the data, however, from my perspective, there was a time-sensitive deliverable, and thus given that the numbers were sensible, insofar as they were comparable to results generated by the actual CRSP volatility decile indices, they were used for the replication project. Any additional investigations and QC should be performed at this stage.

#*********IMPORTANT
#clean-up known error-causing stocks
stockPriceEnv$TRUE. <-stockPriceEnv$`TRUE`
rm(`TRUE`, envir = stockPriceEnv)
rm(OHGI, envir = stockPriceEnv)

#This is the time for quality control, since it gets more challenging to fix any issues later on
#**********

This next block of code uses roughly 17 GB of RAM on my machine using a fork cluster with 8 cores, therefore, please ensure that there is adequate memory to run the code. The code checks that each stock has at least 80% of trading days populated within a year (as per CRSP’s requirements), and then calculates the annualized volatility for that stock for each year. Fortunately, the code only takes 1.5 minutes to run using parallel processing.

# Adjusted prices
# stock needs valid returns for eighty percent of trading days
# CRSP uses linear returns

#*****Warning, lots of RAM required

annualizationScaling <- 252
percentOfYear <- 0.8

startTime <- Sys.time()

cl <- makeForkCluster(nnodes = detectCores())
registerDoParallel(cl = cl)

stockVolatility <-
  foreach(stockName = ls(envir = stockPriceEnv), .errorhandling = "pass") %dopar%
  {

    stock <- get(x = stockName, envir = stockPriceEnv)

    typeIndex <- grep("Adjusted",colnames(stock))

    adjustedPrices <- stock[,typeIndex]

    #Compute linear returns
    stockLinearReturns <- retVectorL(adjustedPrices)

    #extract time index and slice into years
    yearsAvail <- unique(str_sub(string = index(stockLinearReturns), end=4))

    #check that stock has 80% of trading days
    validYearTable <-
      lapply(yearsAvail
             , function(year){
               numDaysInYear <- nrow(stockLinearReturns[year])                #need >= 80%
               validYearFlag <- ifelse(numDaysInYear >= floor(annualizationScaling*percentOfYear),1,0)
               return(c(as.numeric(year),validYearFlag))
             }
      )

    validYearTable <- do.call(rbind, validYearTable)

    #Take valid years
    validYears <- as.character(validYearTable[validYearTable[,2]==1,1])

    volatilityAnnual <-
      lapply(validYears
             , function(year){
               timeSeries <- stockLinearReturns[year]

               volatilityAnn <- sd(coredata(timeSeries))*sqrt(annualizationScaling)

               volYear <- as.yearmon(paste0(year,"-","12"))

               volatilityAnn <- xts(x = volatilityAnn, order.by = volYear)

               return(volatilityAnn)
             }
      )
    volatilityAnnual <- do.call(rbind, volatilityAnnual)

    return(volatilityAnnual)
  }

stopCluster(cl = cl)
cl <- NULL
gc()

endTime <- Sys.time()
endTime - startTime
#Time difference of 1.408692 mins
#On 8 cores

#name the list
names(stockVolatility) <- ls(envir = stockPriceEnv)
# http://www.crsp.com/products/documentation/stock-file-indexes-0

Volatiliy Decile Replication

Next, the data is checked for continuity, i.e. that there are no missing years.

#Grab min and max year from each stock
stockVolatilityYears <-
lapply(stockVolatility
       ,function(stockVolVector){
         years <- index(stockVolVector)
         years <- str_sub(string = years, start = 5)
         yearRange <- c(min(years)
                        ,max(years)
                        )
         return(yearRange)
       }

)
#merge min and max into a table
stockVolatilityYears <- do.call(rbind, stockVolatilityYears)

The plot below indicates a break before 1973, indicating that there are gaps in the data prior to that year.

##  [1] "1964" "1970" "1973" "1974" "1975" "1976" "1977" "1978" "1979" "1980"
## [11] "1981" "1982" "1983" "1984" "1985" "1986" "1987" "1988" "1989" "1990"
## [21] "1991" "1992" "1993" "1994" "1995" "1996" "1997" "1998" "1999" "2000"
## [31] "2001" "2002" "2003" "2004" "2005" "2006" "2007" "2008" "2009" "2010"
## [41] "2011" "2012" "2013" "2014" "2015"

Thus, the data is filtered to ensure that the time series begins in 1973.

stockVolatilityYearsSeq <- sort(unique(stockVolatilityYears[,1]))
#exclude first two entries, we want a continuous time series
stockVolatilityYearsSeq <- stockVolatilityYearsSeq[3:length(stockVolatilityYearsSeq)]

Next, stocks are assigned to deciles. In order to accomplish this, the code below extracts all volatilities for the stocks for each year…

#now build a list of volatility vectors

#loop through the years
cl <- makeForkCluster(nnodes = detectCores())
startTime <- Sys.time()

stockVolatilityTimeSeriesList <-
   parLapply(cl
  , stockVolatilityYearsSeq
  ,function(year){
    #here we'll loop through the stocks, and pull out all the years
    volList <-
    lapply(stockVolatility
           , function(stockVolVector){
             return(stockVolVector[year])
           }
    )

    volVector <- do.call(rbind, volList)

    return(volVector)

  }
)

stopCluster(cl = cl)
cl <- NULL
gc()

#set names
names(stockVolatilityTimeSeriesList) <- stockVolatilityYearsSeq

endTime <- Sys.time()
endTime - startTime
# Time difference of 53.64748 sec
# On 8 cores</code>

…and then assigns the volatilities to deciles for each year.

<code>decilesByYear <-
lapply(stockVolatilityTimeSeriesList
       , function(stockVolatilityTimeSeries){
                  quantile(x = stockVolatilityTimeSeries
                          , probs = seq(0,1,0.1)
                          , na.rm = TRUE
                          )
                  }
      )
decilesByYear <- do.call(rbind,decilesByYear)

years <- rownames(decilesByYear)
deciles <- colnames(decilesByYear)

By summarizing the results, outliers can clearly be observed in the more recent years. This indicates that the high volatility stocks can very likely also benefit from additional QC, however to keep this post brief, no additional QC will be conducted.

Volatility Decile by Year (1995 and onward)

	0%	10%	20%	30%	40%	50%	60%	70%	80%	90%	100%
1995	0.04	0.16	0.19	0.22	0.27	0.32	0.38	0.46	0.57	0.75	5.28
1996	0.10	0.16	0.20	0.24	0.28	0.33	0.40	0.50	0.63	0.79	31.40
1997	0.00	0.18	0.24	0.27	0.31	0.37	0.43	0.51	0.62	0.80	6.27
1998	0.00	0.22	0.30	0.35	0.40	0.45	0.53	0.61	0.73	0.95	23.28
1999	0.00	0.21	0.28	0.34	0.39	0.44	0.52	0.61	0.74	1.01	14.21
2000	0.08	0.23	0.33	0.40	0.46	0.53	0.63	0.76	0.97	1.29	92.44
2001	0.07	0.22	0.29	0.35	0.41	0.48	0.57	0.69	0.85	1.11	4.98
2002	0.06	0.21	0.28	0.34	0.38	0.45	0.52	0.62	0.79	1.02	8.00
2003	0.06	0.16	0.21	0.26	0.30	0.34	0.40	0.49	0.62	0.85	12.00
2004	0.07	0.15	0.20	0.23	0.27	0.31	0.36	0.44	0.54	0.71	36.57
2005	0.04	0.14	0.19	0.22	0.26	0.30	0.34	0.39	0.47	0.63	39.75
2006	0.03	0.13	0.18	0.22	0.26	0.30	0.35	0.41	0.47	0.61	4.79
2007	0.06	0.18	0.24	0.28	0.31	0.35	0.39	0.44	0.51	0.65	11434.96
2008	0.07	0.41	0.50	0.57	0.63	0.70	0.77	0.85	0.96	1.15	83.85
2009	0.05	0.29	0.38	0.45	0.52	0.59	0.67	0.77	0.91	1.18	103.81
2010	0.00	0.19	0.24	0.28	0.33	0.37	0.42	0.48	0.56	0.70	13.77
2011	0.04	0.21	0.27	0.33	0.37	0.42	0.47	0.53	0.62	0.76	188.01
2012	0.04	0.14	0.19	0.23	0.28	0.32	0.37	0.43	0.53	0.69	501.00
2013	0.01	0.14	0.18	0.21	0.24	0.28	0.32	0.38	0.48	0.66	38.60
2014	0.01	0.12	0.17	0.21	0.25	0.29	0.34	0.41	0.51	0.68	424.01
2015	0.03	0.15	0.21	0.24	0.28	0.33	0.39	0.47	0.58	0.75	1493.12

Now that volatility has been assigned to deciles for each year, stocks can be assigned deciles based on their volatilities for each year. This takes roughly 37 minutes on my machine using all 8 cores.

#now assign the deciles to each of the stocks by year
cl <- makeForkCluster(nnodes = detectCores())
startTime <- Sys.time()
registerDoParallel(cl)

stockDeciles <- foreach(stockVolVector = stockVolatility, .errorhandling = "pass") %dopar%
       {

         if(prod(is.na(stockVolVector))){return(NA)} else {

           stockDecileVector <- stockVolVector
           stockDecileVector <- xts(x=rep(NA,nrow(stockDecileVector)),order.by = index(stockDecileVector))

                  for(dec in deciles){
                      for(yr in years){
                                      stockDecileVector[yr] <- ifelse(stockVolVector[yr] <= decilesByYear[yr,dec] & is.na(stockDecileVector[yr]),dec,stockDecileVector[yr])
                                }
                  }
                  return(stockDecileVector)
         }
       }

stopCluster(cl = cl)
cl <- NULL
gc()

#assign names
names(stockDeciles) <- names(stockVolatility)

endTime <- Sys.time()
endTime - startTime
# Time difference of 37.39486 mins
# on 8 cores

So far, we have lost 212 stocks from a failure to download, and after running the code block above, we have also lost an additional 392 symbols due to NA or NULL values, which ultimately results in a loss of 9.81% of symbols.

Based on the results in the previous block of code, decile portfolios can be constructed for each year. The following code takes roughly 24 minutes to run on my machine (using 8 cores).

#assign stocks to decile portfolios
cl <- makeForkCluster(nnodes = detectCores())
startTime <- Sys.time()

decilePortfolioWeights <-
parLapply(cl
       , deciles[-1]
       , function(dec){

         matrixWeights <-
         lapply(years
                ,function(yr){

                            rowVectorWeights <-
                            lapply(names(stockDeciles)
                                   , function(stock){

                                     assignedStockDecile <- stockDeciles[[stock]][yr]

                                     #There will be an error if the stock previously didn't meet a condition to get assigned a volatility (too few obs)
                                     if(length(assignedStockDecile)==0 | class(stockDeciles[[stock]])[1] == "simpleError" | prod(is.na(stockDeciles[[stock]]))){      dummyResult <- xts(x = 0
                                                                                            , order.by = as.yearmon(paste0(yr,"-12"))
                                                                                            )
                                                                              colnames(dummyResult) <- stock
                                                                              return(dummyResult)

                                                                               } else {
                                      assignedWeight <- ifelse(assignedStockDecile == dec,1,0)
                                     }
                                     colnames(assignedWeight) <- stock
                                     return(assignedWeight)
                                   }

                            )
                            rowVectorWeights <- do.call(cbind, rowVectorWeights)
                            return(rowVectorWeights)
                }
         )

         matrixWeights <- do.call(rbind, matrixWeights)
         return(matrixWeights)

       }
       )

stopCluster(cl = cl)
cl <- NULL
gc()

#assign names
names(decilePortfolioWeights) <- deciles[-1]

endTime <- Sys.time()
endTime - startTime

# Time difference of 24.0372 mins
# on 8 cores

The code above generates a matrix of 1/0 flags, where the rows are the years, and the columns are the stocks. Two things now need to happen to proceed. First, the date needs to be converted to an end-of-year date, since CRSP assigns the portfolio weights in the subsequent year (later, PerformanceAnalytics::Return.portfolio() will then assign the weights to the subsequent year). Second, the 1/0 flags need to be converted to weights, so that they sum to 1 for each year (each row).

#convert to date at end of year
decilePortfolioWeights <- lapply(decilePortfolioWeights
                                 , function(decilePortfolio){
                                   decilePortfolioIDX <- index(decilePortfolio)
                                   decilePortfolioIDX <- as.Date(timeLastDayInMonth(as.Date(decilePortfolioIDX)))
                                   decilePortfolio <- xts(x = coredata(decilePortfolio)
                                                          , order.by = decilePortfolioIDX
                                                          )
                                   return(decilePortfolio)
                                 }
                                 )

#Now convert them from flags to weights
decilePortfolioWeights <- lapply(decilePortfolioWeights
                                 , function(decilePortfolio){
                                   decilePortfolioIDX <- index(decilePortfolio)
                                   decilePortfolioMatrix <- coredata(decilePortfolio)

                                   sumVector <- apply(decilePortfolioMatrix,1,sum)

                                   scaledDecilePortfolioMatrix <- lapply(1:length(sumVector)
                                                                         ,function(rowNum){
                                                                           scaledVector <- decilePortfolioMatrix[rowNum,]/sumVector[rowNum]
                                                                           return(scaledVector)
                                                                         }
                                                                         )
                                   scaledDecilePortfolioMatrix <- do.call(rbind, scaledDecilePortfolioMatrix)

                                   decilePortfolio <- xts(x = coredata(scaledDecilePortfolioMatrix)
                                                          , order.by = decilePortfolioIDX
                                   )
                                   return(decilePortfolio)
                                 }
)

Some checks are performed to ensure that the weights sum to 1 across all years.

Verification of Weight-Sums Across Deciles and Years

#check, should see 1's across the board
xts(
  do.call(cbind,
lapply(decilePortfolioWeights
      , function(x){apply(coredata(x),1,sum)}
))
, index(decilePortfolioWeights[[1]])
)


##            10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
## 1973-12-31   1   1   1   1   1   1   1   1   1    1
## 1974-12-31   1   1   1   1   1   1   1   1   1    1
## 1975-12-31   1   1   1   1   1   1   1   1   1    1
## 1976-12-31   1   1   1   1   1   1   1   1   1    1
## 1977-12-31   1   1   1   1   1   1   1   1   1    1
## 1978-12-31   1   1   1   1   1   1   1   1   1    1
## 1979-12-31   1   1   1   1   1   1   1   1   1    1
## 1980-12-31   1   1   1   1   1   1   1   1   1    1
## 1981-12-31   1   1   1   1   1   1   1   1   1    1
## 1982-12-31   1   1   1   1   1   1   1   1   1    1
## 1983-12-31   1   1   1   1   1   1   1   1   1    1
## 1984-12-31   1   1   1   1   1   1   1   1   1    1
## 1985-12-31   1   1   1   1   1   1   1   1   1    1
## 1986-12-31   1   1   1   1   1   1   1   1   1    1
## 1987-12-31   1   1   1   1   1   1   1   1   1    1
## 1988-12-31   1   1   1   1   1   1   1   1   1    1
## 1989-12-31   1   1   1   1   1   1   1   1   1    1
## 1990-12-31   1   1   1   1   1   1   1   1   1    1
## 1991-12-31   1   1   1   1   1   1   1   1   1    1
## 1992-12-31   1   1   1   1   1   1   1   1   1    1
## 1993-12-31   1   1   1   1   1   1   1   1   1    1
## 1994-12-31   1   1   1   1   1   1   1   1   1    1
## 1995-12-31   1   1   1   1   1   1   1   1   1    1
## 1996-12-31   1   1   1   1   1   1   1   1   1    1
## 1997-12-31   1   1   1   1   1   1   1   1   1    1
## 1998-12-31   1   1   1   1   1   1   1   1   1    1
## 1999-12-31   1   1   1   1   1   1   1   1   1    1
## 2000-12-31   1   1   1   1   1   1   1   1   1    1
## 2001-12-31   1   1   1   1   1   1   1   1   1    1
## 2002-12-31   1   1   1   1   1   1   1   1   1    1
## 2003-12-31   1   1   1   1   1   1   1   1   1    1
## 2004-12-31   1   1   1   1   1   1   1   1   1    1
## 2005-12-31   1   1   1   1   1   1   1   1   1    1
## 2006-12-31   1   1   1   1   1   1   1   1   1    1
## 2007-12-31   1   1   1   1   1   1   1   1   1    1
## 2008-12-31   1   1   1   1   1   1   1   1   1    1
## 2009-12-31   1   1   1   1   1   1   1   1   1    1
## 2010-12-31   1   1   1   1   1   1   1   1   1    1
## 2011-12-31   1   1   1   1   1   1   1   1   1    1
## 2012-12-31   1   1   1   1   1   1   1   1   1    1
## 2013-12-31   1   1   1   1   1   1   1   1   1    1
## 2014-12-31   1   1   1   1   1   1   1   1   1    1
## 2015-12-31   1   1   1   1   1   1   1   1   1    1

Here is another check, where the number of stocks for each decile by year is computed. What we should observe, and do ultimately see, is an approximately equal distribution of stocks across all deciles within a year. Or in other words, the number of stocks across each row (year) should nearly all be equal.

Count of Stocks Across Deciles and Years

#Number of stocks in each portfolio by year
do.call(cbind,
lapply(decilePortfolioWeights
       , function(x){vector <- apply(coredata(x),1,function(y){length(which(y>0))})
                      names(vector) <- years
                      return(vector)
         }
))


##      10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
## 1973   4   5   5   5   5   4   5   5   5    5
## 1974   7   8   7   8   7   8   7   8   7    8
## 1975   7   8   7   8   8   7   8   7   8    8
## 1976   7   8   8   7   8   8   7   8   8    8
## 1977   8   9   9   8   9   9   8   9   9    9
## 1978   8   9   9   9   9   9   9   9   9    9
## 1979   9   9   9   9   9   9   9   9   9    9
## 1980  10  10  10  10  10  10  10  10  10   10
## 1981  18  19  19  19  19  19  19  19  19   19
## 1982  22  22  22  22  23  22  22  22  22   23
## 1983  22  23  23  23  23  22  23  23  23   23
## 1984  26  27  27  26  27  27  26  27  27   27
## 1985  35  35  35  35  36  35  35  35  35   36
## 1986  38  40  40  39  40  40  39  40  40   40
## 1987  44  44  44  44  45  44  44  44  44   45
## 1988  62  62  62  62  62  62  62  62  62   62
## 1989  64  65  65  65  65  65  65  65  65   65
## 1990  72  72  72  73  72  72  73  72  72   73
## 1991  95  95  95  95  95  95  95  95  95   95
## 1992 115 115 115 116 115 115 116 115 115  116
## 1993 129 129 130 129 130 129 129 130 129  130
## 1994 146 146 146 147 146 146 147 146 146  147
## 1995 161 162 162 161 162 162 161 162 162  162
## 1996 182 182 182 182 183 182 182 182 182  183
## 1997 201 201 201 201 202 201 201 201 201  202
## 1998 215 216 216 216 216 215 216 216 216  216
## 1999 231 232 231 232 232 231 232 231 232  232
## 2000 250 251 250 251 251 250 251 250 251  251
## 2001 264 265 264 265 264 265 264 265 264  265
## 2002 272 272 273 272 273 272 272 273 272  273
## 2003 284 285 285 285 285 284 285 285 285  285
## 2004 301 301 302 301 302 301 301 302 301  302
## 2005 320 320 320 321 320 320 321 320 320  321
## 2006 339 339 339 339 340 339 339 339 339  340
## 2007 358 358 358 358 358 358 358 358 358  359
## 2008 383 383 383 383 384 383 383 383 383  384
## 2009 395 396 395 396 396 395 396 395 396  396
## 2010 406 406 407 406 407 406 406 407 406  407
## 2011 429 430 429 430 430 429 430 429 430  430
## 2012 451 452 451 452 452 451 452 451 452  452
## 2013 477 477 477 478 477 477 478 477 477  478
## 2014 512 513 513 512 513 513 512 513 513  513
## 2015 554 555 555 555 555 555 555 555 555  555

We are almost ready to compute the volatility decile portfolios. The column names across all volatility deciles are verified to ensure that they are equal across all years. This will be used to match weights to the stock returns when computing a portfolio.

#Make sure they match decilePortfolioWeights names

#Should return empty values (since it is looking for where it is NOT TRUE)
which(
!apply(
  do.call(rbind,lapply(decilePortfolioWeights
       , function(x){colnames(x)}
))
,2
,function(y){isTRUE(all.equal( max(y) ,min(y)) )
  #This function taken from here:
  #http://stackoverflow.com/questions/4752275/test-for-equality-among-all-elements-of-a-single-vector
  }
))


## integer(0)

It returns an empty result, indicating that any one of the list item’s names can be used as a reference.

#So we can take the first since we know they're all equal
stockNames <- colnames(decilePortfolioWeights[[1]])

Next, the “stockNames” variable contents are matched to the stock price names so that matrix multiplication can be performed later (i.e. the order of the weights match the order of the stock returns).

stockEnvIndex <-
do.call(c,
        lapply(stockNames
               ,function(stock){which(stock == names(as.list(stockPriceEnv)))})
)

Now, a very large matrix of prices, in the same order as the stock weights, will be created. Ideally, the xts::merge() function would be used. However, since there is a large volume of data, my machine cannot handle it. The way around it, is to loop through all the stocks and join them together, very slowly. The code below runs in sequence (one core) and takes just over 30 minutes on my machine.

#CRSP uses adjusted prices (for dividends)
#http://www.crsp.com/files/data_descriptions_guide_0.pdf #p.102
adjustedColumn <- grep(".Adjusted",colnames(as.list(stockPriceEnv)[[stockEnvIndex[1]]]))

#First one...
stockPriceMatrix <- as.list(stockPriceEnv)[[stockEnvIndex[1]]][,adjustedColumn]
colnames(stockPriceMatrix) <- gsub(".Adjusted","",colnames(stockPriceMatrix))
##head(stockPriceMatrix)

startTime <- Sys.time()
#the rest
for(stockIDX in 2:length(stockEnvIndex)){
  stockVector <- as.list(stockPriceEnv)[[stockEnvIndex[stockIDX]]][,adjustedColumn]
  colnames(stockVector) <- gsub(".Adjusted","",colnames(stockVector))
  #head(stockVector)
  stockPriceMatrix <- merge(stockPriceMatrix,stockVector)
}
rm(stockVector)
endTime <- Sys.time()
endTime - startTime
# Time difference of 30.50445 mins
# on 1 core

So we have a matrix of prices, in the same order as the weights that were calculated. The code below verifies that the order is the same across all decile portfolios.

#check that names and order of names match
do.call(c,
lapply(decilePortfolioWeights
       ,function(col){
identical(
colnames(stockPriceMatrix)
, colnames(col)
)
       })
)


##  10%  20%  30%  40%  50%  60%  70%  80%  90% 100%
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Returns then need to be calculated from the matrix of prices.

#Now we can calculate linear returns
stockLRetMatrix <-retMatrixL(stockPriceMatrix)

Finally, the decile portfolio returns can be calculated, using the stock returns and weights that were calculated previously.

decilePortfolioMatrix <-
  do.call(merge,
          lapply(decilePortfolioWeights
                 ,function(portfolioWeights){
                   Return.rebalancing(R = stockLRetMatrix
                                      , weights = portfolioWeights
                   )

                 }
          ))
names(decilePortfolioMatrix) <- names(decilePortfolioWeights)

Below, the cumulative returns are plotted on a log-scale, which facilitates a visual comparison of the decile performance. We can see that the higher volatility deciles exhibit higher cumulative returns, which is what is expected as the volatility increases.

Cumulative Returns of Volatility Decile Portfolios (Log Scale)

Numeric results can be generated using the following line of code.

computeTimeSeriesStatistics(linearReturnMatrix = decilePortfolioMatrix)

The Sharpe Ratio declines as the volatility increases, which is consistent with the low-volatility anomaly. However, the peculiar result is that the Sharpe ratio then increases at the high volatility portfolios. One reason for this could be the embedded survivorship bias, since riskier stocks in the high volatility portfolios would be excluded, thereby increasing risk-adjusted returns. However, if we compare the Sharpe-Ratio-trend against actual CRSP data results, seen in the table below, they exhibit the initial decline in Sharpe Ratio, and subsequent rise in Sharpe Ratio for the higher volatility deciles, which conflicts with the low-volatility anomaly results. Overall, replicated volatility decile returns, risk, and tail risk increase as the volatility increases, which is consistent with expectations, and which is observed in the actual CRSP volatility decile performance.

Replicated Volatility Decile Results

	10%	20%	30%	40%	50%	60%	70%	80%	90%	100%
Annualized Return	9.77	11.92	11.85	12.60	13.51	14.48	15.22	16.69	18.85	33.46
Annualized Standard Deviation	8.62	11.48	14.13	14.93	15.80	16.80	17.83	19.08	20.15	25.30
Semi-Deviation	6.28	8.43	10.20	10.84	11.52	12.18	12.93	13.91	14.79	17.65
Annualized Sharpe Ratio (Rf=0%)	1.13	1.04	0.84	0.84	0.86	0.86	0.85	0.87	0.94	1.32
Adjusted Sharpe ratio (Risk free = 0)	-3.10	-0.30	0.22	0.35	0.43	0.51	0.53	0.57	0.57	0.33
Average Drawdown	1.13	1.40	1.78	2.04	2.18	2.56	2.84	3.01	3.33	4.05
Average Recovery	11.73	10.07	10.50	10.75	10.63	11.08	12.39	12.60	13.50	13.81
VaR 95	-0.68	-1.01	-1.27	-1.39	-1.43	-1.57	-1.66	-1.83	-1.97	-2.29
VaR 99	-1.51	-2.05	-2.39	-2.47	-2.70	-2.86	-2.97	-3.14	-3.47	-4.08
ETL 95	-1.26	-1.71	-2.05	-2.16	-2.31	-2.45	-2.60	-2.78	-2.98	-3.56
ETL 99	-2.44	-3.07	-3.68	-3.80	-4.07	-4.22	-4.41	-4.67	-4.98	-6.03
Worst Loss	-11.13	-13.77	-15.23	-15.98	-15.02	-15.04	-15.54	-15.44	-16.08	-12.95
Skewness	-0.02	-0.77	-0.28	-0.61	-0.58	-0.44	-0.42	-0.46	-0.54	0.45
Excess Kurtosis	71.10	25.61	22.85	16.06	13.08	10.96	10.31	8.76	8.26	12.49

Actual Volatility Decile Results

Table extracted from:

Han, Yufeng, Ke Yang, and Guofu Zhou. 2013. “A New Anomaly: The Cross-Sectional Profitability of Technical Analysis.” Journal of Financial and Quantitative Analysis 48 (5). Cambridge University Press: 1433–61. http://dx.doi.org/10.1017/S0022109013000586.

Conclusion

This post presents R code which uses free Yahoo! Finance data to replicate the performance of CRSP volatility deciles. Though the replicated portfolios contain survivorship bias, the overall risk and return characteristics match those exhibited by actual CRSP volatility decile portfolios.

To leave a comment for the author, please follow the link and comment on their blog: R – Propfolio Management.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Replicating CRSP Volatility Decile Portfolios in R

Introduction

Download Security Tickers

Load the Tickers and Download the Data

Failed Download Symbols

Quality Control

OHGI Adjusted Closing Prices

Volatiliy Decile Replication

Volatility Decile by Year (1995 and onward)

Verification of Weight-Sums Across Deciles and Years

Count of Stocks Across Deciles and Years

Cumulative Returns of Volatility Decile Portfolios (Log Scale)

Replicated Volatility Decile Results

Actual Volatility Decile Results

Conclusion

Related

Introduction

Download Security Tickers

Load the Tickers and Download the Data

Failed Download Symbols

Quality Control

OHGI Adjusted Closing Prices

Volatiliy Decile Replication

Volatility Decile by Year (1995 and onward)

Verification of Weight-Sums Across Deciles and Years

Count of Stocks Across Deciles and Years

Cumulative Returns of Volatility Decile Portfolios (Log Scale)

Replicated Volatility Decile Results

Actual Volatility Decile Results

Conclusion

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)