Learning R: Project 1, Part 2

October 30, 2011
By

(This article was first published on Adventures in Statistical Computing, and kindly contributed to R-bloggers)

So it's been a week since I started down this path.  I worked most of this out over last weekend, went to a conference, had hectic week at work, and then realized I lost my work.  Gah.

I'll be posting my general thoughts on R later.  Mostly it seems to be a neat language.  Lots of ways to do things. The ability to create output seems limited.  I played with a number of things trying to create rich HTML output like I did with SAS.  R2HTML might be what I need; I couldn't get it to work.

So here is what I have

require(fImport)
require(PerformanceAnalytics)

These two packages seem to do a lot of what I need. PerformanceAnalytics has a wealth of charting tools for financial data.

#Function to load stock data into a Time Series object
importSeries = function (symbol,from,to) {



#Read data from Yahoo! Finance
input = yahooSeries(symbol,from=from,to=to)

#Character Strings for Column Names
adjClose = paste(symbol,".Adj.Close",sep="")
inputReturn = paste(symbol,".Return",sep="")
CReturn = paste(symbol,".CReturn",sep="")

#Calculate the Returns and put it on the time series
input.Return = returns(input[,adjClose])
colnames(input.Return)[1] = inputReturn
input = merge(input,input.Return)

#Calculate the cumulative return and put it on the time series
input.first = input[,adjClose][1]
input.CReturn = fapply(input[,adjClose],FUN=function(x) log(x) - log(input.first))
colnames(input.CReturn)[1] = CReturn
input = merge(input,input.CReturn)

#Deleting things (not sure I need to do this, but I can't not delete things if
# given a way to...
rm(input.first,input.Return,input.CReturn,adjClose,inputReturn,CReturn)

#Return the timeseries
return(input)

}
I learned a lot about data handling in R putting this function together.

#Load SPY data
spy = importSeries("spy",from="2010-01-01",to="2011-10-22")
#Load Google data
goog = importSeries("goog",from="2010-01-01",to="2011-10-22")

#merge the time series
merged = merge(spy,goog)
Nothing fancy here.  The merge() function is nice, but I have no idea how to do anything but the "full" join that it defaults to.  If anyone knows of a good tutorial on doing more advanced SQL style joins, please let me know.

#Chart the Cumulative Returns
png("c:\\temp\\Returns_r.png")
chart.CumReturns(merged[,c("spy.Return","goog.Return"),drop=FALSE],
                            main="Total Returns SPY vs Google",
                            legend.loc="topleft")
dev.off()

#Create the Correlation plot
png("c:\\temp\\Corr.png")
chart.Correlation(merged[,c("spy.Return","goog.Return")],histogram=TRUE,pch="+")
dev.off()
First, the chart.CumReturns() produces a nice graph. Better than I was able to do with plot().

Second, the char.Correlation() also gives a neat output. I would really like to find a comparable method to produce the alpha ellipses that I did in SAS.

Third, I cannot find a good method that is comparable to PROC CORR. Can I get a good output with both correlation, covariance, mean, std, etc? Please, let me know.
#Regress Google on SPY
reg = lm(merged[,"goog.Return"]~merged[,"spy.Return"])

#Create the confidence interval
newx = merged[,"spy.Return"]
prd = predict(reg,newdata=newx,interval="confidence",level=.95, type="response")

#Print the Regression Summary
summary(reg)
Linear Regression seems pretty easy. It took me a while to decipher the R help to figure out the confidence interval stuff. Again, if there is a way to produce a rich set of output from a regression like SAS and PROC REG, please show me.

Here is the R output:
Call:
lm(formula = merged[, "goog.Return"] ~ merged[, "spy.Return"])


Residuals:
Min1QMedian3QMax
-0.089348-0.005702-0.0000830.0055130.116929


Coefficients:
EstimateStd. Errort valuePr(>|t|)
(Intercept)-0.00038410.0006424-0.5980.55
merged[, "spy.Return"]0.96412180.050934618.929<2e-16 ***

--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0137 on 453 degrees of freedom (1 observation deleted due to missingness)

Multiple R-squared: 0.4416, Adjusted R-squared: 0.4404

F-statistic: 358.3 on 1 and 453 DF, p-value: < 2.2e-16
Matches SAS. It's not exact, but very close.  That's good.
#Chart the regression
png("c:\\temp\\Regression.png")
chart.Regression(merged[,"goog.Return",drop=FALSE],
                          merged[,"spy.Return",drop=FALSE],
                          fit=c("linear"),
                          main="Google ~ SPY",
                          xlab="SPY Return",
                          ylab="Google Return")

#add the confidence interval
lines(newx$spy.Return,prd[,2],col="Red",lty=2)
lines(newx$spy.Return,prd[,3],col="Red",lty=2)
dev.off()
Using the chart.Regression() from PerformanceAnalytics. The fit interval looks suspect. Maybe I did something wrong.

To leave a comment for the author, please follow the link and comment on his blog: Adventures in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.