**Adventures in Statistical Computing**, and kindly contributed to R-bloggers)

A few days ago, I wrote a piece on finding the minimum expected shortfall portfolio. A few astute commenters quickly picked up where I was going with this — using this as an alternative to low/minimum volatility portfolios. What follows are the results of a small study I put together. This post will deal with the portfolio selection and calculation of portfolio weights. The next post will deal with computing portfolio statistics and comparing that portfolio to alternatives.

Note: In my last post, I mentioned the lack of speed in R during the NL optimization. I was unable to rectify that situation. I got tired of waiting on R while working on the code so I switched to SAS IML for the NL optimization. I use the SAS provided methods to export the needed data into R objects. I will use R and the handy PerformanceAnalytics package in the next post. If anyone has ways to speed this up in R, please post it below.

**Portfolio Selection**

I wanted a set of stocks that were representative of the larger market, but not too many to bog down the analysis. I choose the stocks of the Dow.

I don’t have access to the historic components of the Dow, so I have to stick to the current composition. I want a fairly long history, but I don’t want to add too much survivors bias. I.E. If I go back to the late 80s, Cisco is basically a penny stock and will skew the results positive. However, I wanted long enough to capture large market swings. I decided to run the portfolio from 2000 through the end of 2011.

One problem is that Google is part of the Dow and didn’t exist before 2004. So I dropped Google, leaving me with 29 stocks.

**Methodology**

We will use weekly return data. This smooths some of the noise from the series and hopefully some of the time dependence. Using 2 years of weekly returns (approximately 104 data points) we subtract the mean. From that fit a Student-T copula to the empirical distributions (previous example here). Should the MLE fail to converge, fail over to a Gaussian copula.

We will rebalance the portfolio quarterly. We assume there is no time dependence in the return series. Using the fitted copula and empirical distributions, simulate 13 weeks ahead for 5000 iterations (65,000 total draws in the simulation). The 13 weekly returns will be aggregated into 1 quarterly return. From the 5000 simulated quarterly returns, we will run the minimum expected shortfall optimization.

Because the mean was taken out of the empirical distribution, our simulation assumes a 0 expected return over the period.

**Code**

The following SAS code makes use of the macros in the copula example with a few modifications. You can download the new version of the macros here.

The SAS code is can be downloaded here. It is fairly well commented, so I will not now subject you to a walk through. If there are questions, feel free to post them.

odshtml close;%letstocks =mmm aa axp t bac ba cat cvx csco ko dd xom ge hpq hd intc ibm jnj jpm mcd mrk msft pfe pg trv utx vz wmt dis;libnamedow30 “C:\temp\9.3\dow30”;procdatasetslib=work nolist kill;quit;/*Download stocks and prices*/%(&stocks,get_stocks01JAN1998,30JUL2012,keepPrice=1);/*output prices and returns to permanent location*/datadow30.returns;setreturns;run;/*Put Week, Quarter, and Year on returns for aggregation*/datareturns;formatweek qtr year4.;setreturns;week = week(date);year = year(date);qtr = qtr(date);run;datadow30.prices;formatweek qtr year4.;setprices;week = week(date);year = year(date);qtr = qtr(date);run;datadow30.prices_wk;setdow30.prices;byyear week;iflast.week;run;/*Sum returns over each week to create weekly returns*/procsummarydata=returns;var&stocks;byyear week;outputout=weekly(drop=_type_ _freq_) sum=;run;/*Copy weekly returns to perm library*/procdatasetslib=work nolist;copyout=dow30;selectweekly ;run;quit;/*Add a record index*/dataweekly(index=(indx));setweekly;indx = _N_;run;/*Select the index values for each quarter end,starting with year end 1999, and ending withyear end 2011 */procsqlnoprint;selectindx, year, weekinto:starts separated by ‘ ‘fromweeklywhere(2012> year >1999andweek in (13,26,39,52))or (year =1999and week =52)orderby year, week;quit;%put&starts;/*Macro will loop over the starting index valuesstarts = period starting index valuesobs = number of observations for copula fittingfwd = number of periods forward to simulatedraws = number of draws for simulationmax = number of times to loop, m<1 means loop over allvalues in starts */%macroloop(starts,obs=100,fwd=1,draws=100,max=0);/*Total draws needed in sumulation*/%letnDraws = %eval(&fwd*&draws);/*Number of items in starts*/%letnSims = %sysfunc(countw(&starts));/*update max with nSims if max < 1*/%if%sysevalf(&max <1) %then%letmax=&nSims;;/*Delete optES if it exists*/proc datasets lib=work nolist;delete optES;run;quit;/*Delete cov if it exists*/proc datasets lib=dow30 nolist;delete cov;run;quit;/*Start main loop*/%doi=1%to &max;/*start= current starting index*/%letstart = %scan(&starts,&i);/*e= ending index*/%lete = %eval(&start – &obs);/*create subset of weekly returnsget the as-of week and year*/data test;set weekly(where=(&e <= indx <=&start)) end=last;if _n_ =1then do;call symput(“foy”,put(year,4.));call symput(“fow”,put(week,2.));end;if last then do;call symput(“aoy”, put(year,4.));call symput(“aow”, put(week,2.));end;run;/*de-mean the weekly returns over the subset*/proc standard mean=0data=test out=test;var &stocks;run;/*Create the covariance matrix for the subset,store in perm location */proc corr data=testout=cov(where=(_type_=“COV”)) cov noprint;var &stocks;run;data cov;format year week4.;set cov;year=&aoy;week=&aow;run;proc append base=dow30.cov data=cov force;run;%putAS OF = &start;%putFrom &foy, Week &fow: To &aoy, Week &aow;ods select NONE ;/*Fit a T copula and simulate from it*/proc copula data=test;var &stocks;fit t /marginals=empiricalmethod=MLE;simulate /ndraws=&nDrawsseed=54321out=sim;run;/*Check the sim data. If no observations then thefitting failed. Fail over to a Normal/GaussianCopula */proc sql noprint;select count(*) format=5.into :nsim from sim;quit;%if&nsim =0%then%do;%putNSIM=&nsim, switching to NORMAL Model;proc copula data=test ;var &stocks;fit normal /marginals=empirical;simulate /ndraws=&nDrawsseed=54321out=sim;run;%end;ods select default;/*Create an iteration number in the simulations*/data sim_&aoy._%(&aow);leftformat iter8.;set sim;iter = mod(_N_,&draws);run;proc sort data=sim_&aoy._%(&aow);leftby iter;run;/*Aggregate the iterations into a final number forthe forward period*/proc summary data=sim_&aoy._%(&aow);leftvar &stocks;by iter;output out=sim_&aoy._%(&aow)(drop=_type_ _freq_ iter) sum=;leftrun;/*Use IML for the NL Optimization */proc iml noprint;/*Read the simulations into a matrix*/use sim_&aoy._%(&aow);leftread all var _num_ into sim[colname=cnames];close sim_&aoy._%(&aow);lefts = ncol(sim);w0 = repeat(1/s,s);/*Function calculating expected shortfall*/start F_ES(x) global(sim);v = sim*x`;call sort(v);cutoff = floor(nrow(sim)*.05);es = v[1:cutoff][:];return (-es);finish F_ES;/*Setup constraints*/lb = repeat(0,1,s);ub = repeat(.1,1,s);ones = repeat(1,1,s);addc = ones || {01};con = (lb || {..}) //(ub || {..}) //addc;optn = {00};x = w0`;/*Call the nl quasi-Newton optimizer*/call nlpqn(rc,w,“F_ES”,x,optn,con);/*Put the returns into a data set*/create results_&aoy._%(&aow) from w[colname=cnames];leftappend from w;close results_&aoy._%(&aow);leftquit;/*Add the week and year to the results*/data results_&aoy._%(&aow);leftyear = &aoy;week = &aow;set results_&aoy._%(&aow);leftrun;/*Append the results into the final results*/proc append data=results_&aoy._%(&aow) base=optES force;leftrun;%end;/*Delete the simulation and result files*/proc datasets lib=work nolist;delete sim: results:;run;quit;%mend;/*Call the simulations and optimizationsUse 104 weeks (2 years) in each subsetsimulate 13 weeks (1 qtr) forwarddraw 5000 iterations */%letst = %sysfunc(datetime());optionsnonotes nomprint;*loop(starts,obs=100,fwd=1,draws=100,max=0);%(&starts,obs=loop104,fwd=13,draws=5000,max=0);optionsnotes nomprint;%letel = %sysevalf(%sysfunc(datetime()) – &st);%putTook: ⪙/*Copy the final results and covariance matrices tooutput locationthe */procdatasetslib=work nolist;copyout=dow30;selectoptES ;run;quit;/*Download DIA returns*/%(dia spy,get_stocks03JAN2000,30JUL2012,keepPrice=0);/*Finally, export the optimal portfolio weightsinto R and save the DataFrame*/prociml;callExportDataSetToR(“dow30.optES”, “optES” );callExportDataSetToR(“dow30.cov”,“cov”);callExportDataSetToR(“dow30.prices”,“prices”);callExportDataSetToR(“dow30.prices_wk”,“prices_wk”);callExportDataSetToR(“returns”,“dia_spy”);submit/ R;save(optES, file=“C:\\temp\\dow30\\optES”);save(cov, file=“C:\\temp\\dow30\\cov”);save(prices, file=“C:\\temp\\dow30\\prices”);save(prices_wk, file=“C:\\temp\\dow30\\prices_wk”);save(dia_spy, file=“C:\\temp\\dow30\\dia”);endsubmit;quit;

The next post, I will post the R code to analyse the performance and compare it to a minimum variance portfolio, equal weight portfolio, cap weight portfolio, the Dow (DIA), and the S&P500 (SPY).

**leave a comment**for the author, please follow the link and comment on their blog:

**Adventures in Statistical Computing**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...