Learning R: Project 1, Part 2

October 30, 2011

(This article was first published on Adventures in Statistical Computing, and kindly contributed to R-bloggers)

So it’s been a week since I started down this path.  I worked most of this out over last weekend, went to a conference, had hectic week at work, and then realized I lost my work.  Gah.

I’ll be posting my general thoughts on R later.  Mostly it seems to be a neat language.  Lots of ways to do things. The ability to create output seems limited.  I played with a number of things trying to create rich HTML output like I did with SAS.  R2HTML might be what I need; I couldn’t get it to work.

So here is what I have


These two packages seem to do a lot of what I need. PerformanceAnalytics has a wealth of charting tools for financial data.

#Function to load stock data into a Time Series object
importSeries = function (symbol,from,to) {

#Read data from Yahoo! Finance
input = yahooSeries(symbol,from=from,to=to)

#Character Strings for Column Names
adjClose = paste(symbol,”.Adj.Close”,sep=””)
inputReturn = paste(symbol,”.Return”,sep=””)
CReturn = paste(symbol,”.CReturn”,sep=””)

#Calculate the Returns and put it on the time series
input.Return = returns(input[,adjClose])
colnames(input.Return)[1] = inputReturn
input = merge(input,input.Return)

#Calculate the cumulative return and put it on the time series
input.first = input[,adjClose][1]
input.CReturn = fapply(input[,adjClose],FUN=function(x) log(x) – log(input.first))
colnames(input.CReturn)[1] = CReturn
input = merge(input,input.CReturn)

#Deleting things (not sure I need to do this, but I can’t not delete things if
# given a way to…

#Return the timeseries


I learned a lot about data handling in R putting this function together.

#Load SPY data
spy = importSeries(“spy”,from=”2010-01-01″,to=”2011-10-22″)
#Load Google data
goog = importSeries(“goog”,from=”2010-01-01″,to=”2011-10-22″)

#merge the time series
merged = merge(spy,goog)

Nothing fancy here.  The merge() function is nice, but I have no idea how to do anything but the “full” join that it defaults to.  If anyone knows of a good tutorial on doing more advanced SQL style joins, please let me know.

#Chart the Cumulative Returns
                            main=”Total Returns SPY vs Google”,

#Create the Correlation plot

First, the chart.CumReturns() produces a nice graph. Better than I was able to do with plot().

Second, the char.Correlation() also gives a neat output. I would really like to find a comparable method to produce the alpha ellipses that I did in SAS.

Third, I cannot find a good method that is comparable to PROC CORR. Can I get a good output with both correlation, covariance, mean, std, etc? Please, let me know.

#Regress Google on SPY
reg = lm(merged[,”goog.Return”]~merged[,”spy.Return”])

#Create the confidence interval
newx = merged[,”spy.Return”]
prd = predict(reg,newdata=newx,interval=”confidence”,level=.95, type=”response”)

#Print the Regression Summary

Linear Regression seems pretty easy. It took me a while to decipher the R help to figure out the confidence interval stuff. Again, if there is a way to produce a rich set of output from a regression like SAS and PROC REG, please show me.

Here is the R output:

lm(formula = merged[, “goog.Return”] ~ merged[, “spy.Return”])


Min 1Q Median 3Q Max
-0.089348 -0.005702 -0.000083 0.005513 0.116929


Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0003841 0.0006424 -0.598 0.55
merged[, “spy.Return”] 0.9641218 0.0509346 18.929 <2e-16 ***

— Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0137 on 453 degrees of freedom (1 observation deleted due to missingness)

Multiple R-squared: 0.4416, Adjusted R-squared: 0.4404

F-statistic: 358.3 on 1 and 453 DF, p-value: < 2.2e-16

Matches SAS. It’s not exact, but very close.  That’s good.

#Chart the regression
                          main=”Google ~ SPY”,
                          xlab=”SPY Return”,
                          ylab=”Google Return”)

#add the confidence interval

Using the chart.Regression() from PerformanceAnalytics. The fit interval looks suspect. Maybe I did something wrong.

To leave a comment for the author, please follow the link and comment on their blog: Adventures in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)