Yeah Sure, Maybe, Well … Okay

[This article was first published on Milk Trader, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Whoever wrote the book on statistics, probably avoided getting a proper education in literature. At least that’s my null hypothesis. The cryptic and awkward presentation of probabilities common amongst the Frequentists (no, not the Latin American Socialist faction you’re thinking of) is legend. They speak of the null hypothesis and having a 95% confidence level that you can accept it. Some aren’t happy with that obfuscation and insist that it’s not proper to accept a null hypothesis, but only to reject or fail to reject the null hypothesis. Hmm, well okay I guess. There is another way though. It’s called Bayesian inference. Give it a look. It’s probably the only way known to mankind that you can be a nerd and cool at the same time.

Not that the Bayesian approach is easy to understand at first glance. It’s not difficult, but it’s also easy to get it wrong. My approach to learning was to write a program in R, which I appropriately named the “quote” — bayes_a_nator. This is a simple program and it’s assumptions are ridiculously naive, but you don’t build a killer algo without getting some basic machinery running first.

The following function takes two stocks from Yahoo Finance and calculates how often they have positive returns. It starts with the first stock’s positive return probability, and then gets the second stock’s positive return probability given that the first stock returned positive. Then it solves the riddle to the question: What is the probability that first stock is up, given that second stock is up? The algorithm does not calculate this implicitly, so the probability has to be inferred.

This program has five parts, and instead of pontificating about it, I’ll just let the comments speak for themselves. Enjoy the ASCII art.

bayes_a_nator <- function(sym1, sym2) {
 require ("quantmod")

# FIRST get the objects (price series returns) correctly formatted

    x    <- getSymbols(sym1, auto.assign=FALSE)                                  
    y    <- getSymbols(sym2, auto.assign=FALSE)                                  
    x    <- Delt(x[,4])                                                   
    y    <- Delt(y[,4])                                                    
    x    <- na.locf(x, na.rm=TRUE)                                        
    y    <- na.locf(y, na.rm=TRUE)                                                
    xy   <- cbind(x,y)                                       
# SECOND: get the positive values isolated 
    x.num   <- 0
    xy.num  <- 0  
    x_y.num <- 0  
    for(i in 1:NROW(x))
    if (x[i] > 0)
    x.num     <- x.num + 1             # x.num
 for(i in 1:NROW(xy))
 if (xy[i,1] > 0 &                                    
     xy[i,2] > 0)                                      
 xy.num    <- xy.num + 1             # xy.num     
 for(i in 1:NROW(xy))
 if (xy[i,1] < 0 &&                                    
     xy[i,2] > 0)                                      
 x_y.num   <- x_y.num + 1            # x_y.num       
# THIRD derive other important values with trivial algebra

# but first, some ASCII art
#                     /-- x.up.y.up
#        /--- x.up__/
#      /            \
# ___/               \___ x.up.y.dn
#     \ 
#      \              /-- x.dn.y.up
#       \___  x.dn__/
#                   \
#                    \___ x.dn.y.dn

 x.up       <- x.num/NROW(x) 
 x.dn       <- 1 - x.up    
 x.up.y.up  <- xy.num/x.num  
 x.up.y.dn  <- 1 - x.up.y.up
 x.dn.y.up  <- x_y.num/(NROW(x)-x.num)
 x.dn.y.dn  <- 1 - x.dn.y.up
# FOURTH implement Bayes Theorem

# P(x|y) is the standard Probability notation that reads
# "Probability of x given y"

# The diagram's equivalent is references inside 

#                    P(y.up | x.up)  * P(x.up)            x.up.y.up  *  x.up 
# P(x.up | y.up)  = ___________________________     =    _________________________
#                         P(y.up)                       (x.up * x.up.y.up) + (x.dn * x.dn.y.up)

# We know what x.up is unconditionally, but what is it if we have
# super secret information that y is up?

 nugget <- (x.up.y.up * x.up)/((x.up.y.up * x.up) + (x.dn.y.up * x.dn))

# FIFTH return bayesian wisdom (along with some sanity checking)
 cat(" Probability of "   , sym1, " going up : "                       ,round(x.up     *100, digits=2), "\n", 
     "Probability of "   , sym1, " going dn : "                        ,round(x.dn     *100, digits=2), "\n",  
     "Probability of "   , sym1, " going up and", sym2, " going up : " ,round(x.up.y.up*100, digits=2), "\n",
     "Probability of "   , sym1, " going up and", sym2, " going dn : " ,round(x.up.y.dn*100, digits=2), "\n",
     "Probability of "   , sym1, " going dn and", sym2, " going up: "  ,round(x.dn.y.up*100, digits=2), "\n",
     "Probability of "   , sym1, " going dn and", sym2, " going dn: "  ,round(x.dn.y.dn*100, digits=2), "\n",
     "Probability that " , sym1, " goes up when", sym2, " goes up: "   ,round(nugget   *100, digits=2), "\n")

To leave a comment for the author, please follow the link and comment on their blog: Milk Trader. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)