Making Stuff is Scary

[This article was first published on Milk Trader, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My daughter’s best friend lives just down the street. Her mother runs a cupcake shop that’s just a little further down the street. Being eleven going on sixteen, my daughter fancies herself a “quote” — worker at the shop. She’s not paid in actual money, given child labor laws and all, but sometimes she brings home some cupcakes. The family enjoys them, fighting over certain ones but eating them all in the end. When it’s scary story night at the dinner table, I’m always ready with a fan-cheeky paluki tale if not a binblebot story. Our daughter tells the scariest stories though. The stories of what happens to ingredients that come in the back door of the cupcake shop.

As she relates it, all manner of horrifying things happen to the unsuspecting goods. Eggs get broken, their shells carelessly discarded. Butter gets melted, sugar gets pulverized to dust and flour gets churned while wet stuff gets incorporated into it. My youngest son can almost not bear these stories. His eyes drop to table level when the story starts.

After dinner and our night-time routine, when the lights are dimmed and the doors locked, I sometimes go into my own laboratory. What happens here though is simply too scary to tell to the children. Our bed is not large enough for three extra children, all of whom would likely come during the middle of the night from nightmares if they new this story. In my lab, I do things to data that must not be spoken of.

My intentions are good. I’m not trying to create the bizarre creatures that sometimes form from seemingly nowhere. I’m trying to manufacture nice things. Sweet things that people can enjoy. But sometimes the ingredients get mixed in just the wrong way. It’s why the lab is double-bolted and hidden in a secret location. Protected with lasers and in-the-wall machine guns. Alright, no lasers or guns, but it is, shall we say, secure.

The latest malformed creature to escape is the product of my Bayesian Prolog machine. I either put in the wrong ingredients (not likely) or my logic machine is missing a critical sprocket or gear. I’m not sure yet. The plan after the initial, requisite clean-up effort is complete, is to create a smaller version of this machine where I know what the answer should be. If I get a working model at the simple level, I can expand it. Here is the prolog code that I have so far. The mysterious ! symbol in the rules section is a cut, and behaves like the ifelse in an imperative language. Careful, like I said it’s either missing something or something is in the wrong place.

prob(spx(bear), 0.3039216).
prob(spx(bull), 0.6960784).
prob(spx(up_two), 0.02173633).
conprob(spx(bear) ^ spx(up_two), 0.0125129).
conprob(spx(bull) ^ spx(up_two), 0.009158927).

isprob(A,P) :-
    prob(A,P), !.
isprob(A^B, P) :-
    conprob(A^B, P), !.
isprob(A^B, P) :-
    conprob(B^A, PBA),
    isprob(A, PA),
    isprob(B, PB),
    P is PBA * PA / PB.

I used my new R data mining script to get the percentage values you see in the first part fact section. I also used R to get the conditional fact values. The function downloads all the SPX data (passed in as parameter sym=”^GSPC” in the function) from Yahoo with the following line:

x <- getSymbols(sym, from="1900-01-01",  auto.assign=FALSE) 

I set auto.assign to FALSE because I need the object to be manipulated in the function. I return an object after doing simple look-forward and look-backward manipulations. I also employ the TTR R package’s SMA() function to keep a running total of the 50-day and 200-day moving averages. I use these average to define bull and bear markets. Let’s define the S variable as containing the complete set of data from the data mining function. To get a slice of it (ie, bear and bull markets) requires a simple indexing operation. Like this:

> bull <- S[S$r.50 > S$r.200]

The r.50 and r.200 vectors are where the 50- and 200-day average values are stored. I’m not sure why I called them r.50 and r.200. Come to think of it, it’s not very intuitive is it?  I’ll get to that later. In any case, to get the percentage of time the SPX is a bull market requires a grade-school calculation:

> nrow(bull)/nrow(S)
[1] 0.683243

Now you know where I get the probability fact that goes into the beginning of the Prolog script. The contingency fact (I think this is where my problem is) is calculated below. The bull.up object is all instances where the return was greater than 2% and the market was in a bull regime.

bull.up <- S[S$RET > 0.02 & S$r.50 > S$r.200 ]

Then to get the value:

> nrow(bull.up)/nrow(S)
[1] 0.009158927

Well, after all of this is compiled in a Prolog session, the following query generates the following result:

 ?- isprob(spx(up_two) ^ spx(bull), P).
P = 0.000286004363470997

But a direct calculation in R gives the following result (which I trust more): 

> ans <- bull[bull$RET > 0.02]
> nrow(ans)/nrow(bull)
[1] 0.01340508

So it’s back to the algorithm drawing board I suppose. Tomorrow night is joke night at the dinner table. I think I’ve got some good material.

To leave a comment for the author, please follow the link and comment on their blog: Milk Trader. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)