Think of something observable – countable – that you care about with only one outcome or another. It could be the votes cast in a two-way election in your town, or the free throw shots the center on your favorite...

Following the crash of my hard drive right before leaving Kyoto, I bought a cheap Compaq Presario CQ57 to reinstall Ubuntu 12.04 over the weekend (and have a laptop available before leaving for Australia…) It took about one hour to install from the DVD and everything seems to be working out of the box. The

For me Kaggle becomes a social network for data scientist, as stackoverflow.com or github.com for programmers. If you are data scientist, machine learner or statistician you better off to have a profile there, otherwise you do not exist. Nevertheless, I won’t bet on rosy future for data scientist as journalists suggest (sexy job for next

Part II – Solving Big Problems with Oracle R Enterprise In the first post in this series (see https://blogs.oracle.com/R/entry/solving_big_problems_with_oracle), we showed how you can use R to perform historical rate of return calculations against investment data sourced from a spreadsheet. We demonstrated the calculations against sample data for a small set of accounts. While this worked...

I want to continue with Factor Attribution theme that I presented in the Factor Attribution post. I have re-organized the code logic into the following 4 functions: factor.rolling.regression – Factor Attribution over given rolling window factor.rolling.regression.detail.plot – detail time-series plot and histogram for each factor factor.rolling.regression.style.plot – historical style plot for selected 2 factors factor.rolling.regression.bt.plot

On July 25th, I’ll be presenting at the Seattle R Meetup about implementing Bayesian nonparametrics in R. If you’re not sure what Bayesian nonparametric methods are, they’re a family of methods that allow you to fit traditional statistical models, such as mixture models or latent factor models, without having to fully specify the number of

Here is the second screencast episode of the R-Podcast to accompany episode 8 of the R-Podcast: Visualization with ggplot2. In this screencast I demonstrate a real-time session of using ggplot2 to create boxplots for a visualization of hockey attendance in the NHL. The R code created in this screencast is available in our GitHub repository,