As I was getting a root canal last week, my dental X-Rays reminded me anew of an optical illusion that stumped us for a short time recently when we were developing our heatmapping engine.My X-Rays, before during and after a recent root canal. The...

If there are two (can be generalized to n) classes and both follow the same distribution (but with different parameters) it is possible to predict which class an observations comes from. Here I’ll try to predict a sample’s gender based on their height. The distribution of a person’s height is more or less normal. There

Any frequent reader of R-bloggers will have come across several posts concerning the optimization of code - in particular, the avoidance of loops.Here's another aspect of the same issue. If you have experience programming in other languages besides R, this is probably a no-brainer, but for laymen, like myself, the following example was...

The Comprehensive R Archive Network, or CRAN for short, has been a major driver in the success and rapid proliferation of the R statistical language and environment. CRAN currently hosts around 3400 packages, and is growing at a rapid rate. Not too ...

Last time we looked at the Matrix package and dug a little into the chol(), Cholesky Decomposition, function. I noted that often in finance we do not have a positive definite (PD) matrix. The chol() function in both the Base and Matrix...

The need to analyze time-series or other forms of streaming data arises frequently in many different application areas. Examples include economic time-series like stock prices, exchange rates, or unemployment figures, biomedical data sequences like electrocardiograms or electroencephalograms, or industrial process operating data sequences like temperatures, pressures or concentrations. As a specific example, the figure below shows four data sequences:...

Here are the presentation slides I used for my talk on “Using R for Data Mining Competitions” at Ryerson University as part of the Greater Toronto Area (GTA) R User’s Meetup Group. Presentation (Prezi) Presentation (PDF) Meetup Event page Special thanks to Anthony Goldbloom from Kaggle and various competition winners for sharing their code and

I follow some 80 odd people/ news sources on my twitter account. For a while I wondered which of these sources are most active on twitter. I picked a simple metric '# of status messages posted to twitter' as the measure of activity. Using R I quickly wrote a program to generate my top 10 most active...

This is another pretty simple function, written to help me solve the simplest representation of a trivial but tedious task. Most biologist are probably familiar with this task. How many nucleotide differences exist between two given sequences? I only faced the easiest part of the problem, i.e. I do not perform alignment, I just assume that

ConPA is an asset allocation application using the classic Markowitz approach. For the calculations the open-source statistical programming language R is used. R scripts are executed on cloudnumbers.com’s computer clusters in the Cloud and the results are displayed by ConPA frontend. ConPA allows to set the investment date of the portfolio, the target return and

Earlier, I found an interesting post from Bo Allen on pseudo-random vs random numbers, where the author uses a simple bitmap (heat map) to show that the rand function in PHP has a systematic pattern and compares these to truly random numbers obtained from random.org. The post’s results suggest that pseudo-randomness in

Happy Thanksgiving, everyone. Earlier today, I found an interesting post from Bo Allen on pseudo-random vs random numbers, where the author uses a simple bitmap (heat map) to show that the rand function in PHP has a systematic pattern and compares these to truly random numbers obtained from random.org. The post’s results suggest that pseudo-randomness in PHP is

I wrote a simple Backtesting library to evaluate and analyze Trading Strategies. I will use this library to present the performance of trading strategies that I will study in the next series of posts. It is very easy to write a simple Backtesting routine in R, for example: The code I implemented in the Systematic

Trying not to fall into Thanksgiving Day, football, coma. So I started looking at the Matrix package.Started out by changing my code from before to create a matrix using the Matrix() function from the Matrix package.n = 4000c = Matrix(.9,n,n)for(...