Learning R: Project 1, Part 2

October 30, 2011

So it’s been a week since I started down this path.  I worked most of this out over last weekend, went to a conference, had hectic week at work, and then realized I lost my work.  Gah.

I’ll be posting my general thoughts on R later.  Mostly it seems to be a neat language.  Lots of ways to do things. The ability to create output seems limited.  I played with a number of things trying to create rich HTML output like I did with SAS.  R2HTML might be what I need; I couldn’t get it to work.

So here is what I have


These two packages seem to do a lot of what I need. PerformanceAnalytics has a wealth of charting tools for financial data.

#Function to load stock data into a Time Series object
importSeries = function (symbol,from,to) {

#Read data from Yahoo! Finance
input = yahooSeries(symbol,from=from,to=to)

#Character Strings for Column Names
adjClose = paste(symbol,”.Adj.Close”,sep=””)
inputReturn = paste(symbol,”.Return”,sep=””)
CReturn = paste(symbol,”.CReturn”,sep=””)

#Calculate the Returns and put it on the time series
input.Return = returns(input[,adjClose])
colnames(input.Return)[1] = inputReturn
input = merge(input,input.Return)

#Calculate the cumulative return and put it on the time series
input.first = input[,adjClose][1]
input.CReturn = fapply(input[,adjClose],FUN=function(x) log(x) – log(input.first))
colnames(input.CReturn)[1] = CReturn
input = merge(input,input.CReturn)

#Deleting things (not sure I need to do this, but I can’t not delete things if
# given a way to…

#Return the timeseries


I learned a lot about data handling in R putting this function together.

#Load SPY data
spy = importSeries(“spy”,from=”2010-01-01″,to=”2011-10-22″)
#Load Google data
goog = importSeries(“goog”,from=”2010-01-01″,to=”2011-10-22″)

#merge the time series
merged = merge(spy,goog)

Nothing fancy here.  The merge() function is nice, but I have no idea how to do anything but the “full” join that it defaults to.  If anyone knows of a good tutorial on doing more advanced SQL style joins, please let me know.

#Chart the Cumulative Returns
                            main=”Total Returns SPY vs Google”,

#Create the Correlation plot

First, the chart.CumReturns() produces a nice graph. Better than I was able to do with plot().

Second, the char.Correlation() also gives a neat output. I would really like to find a comparable method to produce the alpha ellipses that I did in SAS.

Third, I cannot find a good method that is comparable to PROC CORR. Can I get a good output with both correlation, covariance, mean, std, etc? Please, let me know.

#Regress Google on SPY
reg = lm(merged[,”goog.Return”]~merged[,”spy.Return”])

#Create the confidence interval
newx = merged[,”spy.Return”]
prd = predict(reg,newdata=newx,interval=”confidence”,level=.95, type=”response”)

#Print the Regression Summary

Linear Regression seems pretty easy. It took me a while to decipher the R help to figure out the confidence interval stuff. Again, if there is a way to produce a rich set of output from a regression like SAS and PROC REG, please show me.

Here is the R output:

lm(formula = merged[, “goog.Return”] ~ merged[, “spy.Return”])


Min 1Q Median 3Q Max
-0.089348 -0.005702 -0.000083 0.005513 0.116929


Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0003841 0.0006424 -0.598 0.55
merged[, “spy.Return”] 0.9641218 0.0509346 18.929 <2e-16 ***

— Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0137 on 453 degrees of freedom (1 observation deleted due to missingness)

Multiple R-squared: 0.4416, Adjusted R-squared: 0.4404

F-statistic: 358.3 on 1 and 453 DF, p-value: < 2.2e-16

Matches SAS. It’s not exact, but very close.  That’s good.

#Chart the regression
                          main=”Google ~ SPY”,
                          xlab=”SPY Return”,
                          ylab=”Google Return”)

#add the confidence interval

Using the chart.Regression() from PerformanceAnalytics. The fit interval looks suspect. Maybe I did something wrong.

