# Yeah Sure, Maybe, Well … Okay

March 23, 2011
By

Whoever wrote the book on statistics, probably avoided getting a proper education in literature. At least that's my null hypothesis. The cryptic and awkward presentation of probabilities common amongst the Frequentists (no, not the Latin American Socialist faction you're thinking of) is legend. They speak of the null hypothesis and having a 95% confidence level that you can accept it. Some aren't happy with that obfuscation and insist that it's not proper to accept a null hypothesis, but only to reject or fail to reject the null hypothesis. Hmm, well okay I guess. There is another way though. It's called Bayesian inference. Give it a look. It's probably the only way known to mankind that you can be a nerd and cool at the same time.

Not that the Bayesian approach is easy to understand at first glance. It's not difficult, but it's also easy to get it wrong. My approach to learning was to write a program in R, which I appropriately named the "quote" -- bayes_a_nator. This is a simple program and it's assumptions are ridiculously naive, but you don't build a killer algo without getting some basic machinery running first.

The following function takes two stocks from Yahoo Finance and calculates how often they have positive returns. It starts with the first stock's positive return probability, and then gets the second stock's positive return probability given that the first stock returned positive. Then it solves the riddle to the question: What is the probability that first stock is up, given that second stock is up? The algorithm does not calculate this implicitly, so the probability has to be inferred.

This program has five parts, and instead of pontificating about it, I'll just let the comments speak for themselves. Enjoy the ASCII art.

bayes_a_nator <- function(sym1, sym2) {  require ("quantmod")# FIRST get the objects (price series returns) correctly formatted    x    <- getSymbols(sym1, auto.assign=FALSE)                                      y    <- getSymbols(sym2, auto.assign=FALSE)                                                                                                                       x    <- Delt(x[,4])                                                       y    <- Delt(y[,4])                                                                                                                                         x    <- na.locf(x, na.rm=TRUE)                                            y    <- na.locf(y, na.rm=TRUE)                                                                                                                                        xy   <- cbind(x,y)                                                # SECOND: get the positive values isolated      x.num   <- 0    xy.num  <- 0      x_y.num <- 0        for(i in 1:NROW(x))    if (x[i] > 0)      x.num     <- x.num + 1             # x.num   for(i in 1:NROW(xy)) if (xy[i,1] > 0 &                                         xy[i,2] > 0)                                                                                                xy.num    <- xy.num + 1             # xy.num        for(i in 1:NROW(xy)) if (xy[i,1] < 0 &&                                         xy[i,2] > 0)                                                                                                x_y.num   <- x_y.num + 1            # x_y.num                                                                                                                # THIRD derive other important values with trivial algebra# but first, some ASCII art                     #                     /-- x.up.y.up#        /--- x.up__/#      /            \# ___/               \___ x.up.y.dn#     \ #      \              /-- x.dn.y.up#       \___  x.dn__/#                   \#                    \___ x.dn.y.dn# x.up       <- x.num/NROW(x)  x.dn       <- 1 - x.up      x.up.y.up  <- xy.num/x.num   x.up.y.dn  <- 1 - x.up.y.up  x.dn.y.up  <- x_y.num/(NROW(x)-x.num) x.dn.y.dn  <- 1 - x.dn.y.up # FOURTH implement Bayes Theorem# P(x|y) is the standard Probability notation that reads# "Probability of x given y"# The diagram's equivalent is references inside <>#                    P(y.up | x.up)  * P(x.up)            x.up.y.up  *  x.up # P(x.up | y.up)  = ___________________________     =    _________________________#                         P(y.up)                       (x.up * x.up.y.up) + (x.dn * x.dn.y.up)# We know what x.up is unconditionally, but what is it if we have# super secret information that y is up? nugget <- (x.up.y.up * x.up)/((x.up.y.up * x.up) + (x.dn.y.up * x.dn))# FIFTH return bayesian wisdom (along with some sanity checking)  cat(" Probability of "   , sym1, " going up : "                       ,round(x.up     *100, digits=2), "\n",      "Probability of "   , sym1, " going dn : "                        ,round(x.dn     *100, digits=2), "\n",       "Probability of "   , sym1, " going up and", sym2, " going up : " ,round(x.up.y.up*100, digits=2), "\n",     "Probability of "   , sym1, " going up and", sym2, " going dn : " ,round(x.up.y.dn*100, digits=2), "\n",     "Probability of "   , sym1, " going dn and", sym2, " going up: "  ,round(x.dn.y.up*100, digits=2), "\n",     "Probability of "   , sym1, " going dn and", sym2, " going dn: "  ,round(x.dn.y.dn*100, digits=2), "\n",     "Probability that " , sym1, " goes up when", sym2, " goes up: "   ,round(nugget   *100, digits=2), "\n")                                                                                  }

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...