GTA R Users Group – Using R for Data Mining Competitions

November 27, 2011
By
GTA R Users Group – Using R for Data Mining Competitions

Here are the presentation slides I used for my talk on “Using R for Data Mining Competitions” at Ryerson University as part of the Greater Toronto Area (GTA) R User’s Meetup Group. Presentation (Prezi) Presentation (PDF) Meetup Event page Special thanks to Anthony Goldbloom from Kaggle and various competition winners for sharing their code and

Read more »

Analytics using R: Most active in my Twitter list

November 27, 2011
By
Analytics using R: Most active in my Twitter list

I follow some 80 odd people/ news sources on my twitter account. For a while I wondered which of these sources are most active on twitter. I picked a simple metric '# of status messages posted to twitter' as the measure of activity. Using R I quickly wrote a program to generate my top 10 most active...

Read more »

Putting it all together: concise code to make dotplots with weighted bootstrapped standard errors

November 27, 2011
By
Putting it all together: concise code to make dotplots with weighted bootstrapped standard errors

I analyze a lot of experiments and there are many times when I want to quickly look at means and standard errors for each cell (experimental condition), or the same for each cell and individual-level attribute level (e.g., Democrat, Independent, … Continue reading →

Read more »

..A Quick Geo-Trick for GoogleMaps in R (using dismo)

November 26, 2011
By
..A Quick Geo-Trick for GoogleMaps in R (using dismo)

... I thought this geocoding-bit might be worth to share (found HERE when searching the web for dismo-documentation).Read more »

Read more »

Comparing StackOverflow and the R-help mailing list

November 26, 2011
By
Comparing StackOverflow and the R-help mailing list

Only recently I discovered StackOverflow. I know, as a nerd already programming for many years that is quite late. For those who are not familiar with StackOverflow (aka SO), it is a Question and Answer site for programmers. It is… See more ›

Read more »

Count different positions between two strings of equal length

November 26, 2011
By
Count different positions between two strings of equal length

This is another pretty simple function, written to help me solve the simplest representation of a trivial but tedious task. Most biologist are probably familiar with this task. How many nucleotide differences exist between two given sequences? I only faced the easiest part of the problem, i.e. I do not perform alignment, I just assume that

Read more »

Deductive imputation with the deducorrect package

November 26, 2011
By
Deductive imputation with the deducorrect package

Missing data hinders statistical analyses. Estimating missing values (imputation) prior to analysis is one way to deal with that. In some cases however, the missings need not be estimated at all, since they can be derived with certainty from other … Continue reading →

Read more »

int64: 64 bit integer vectors for R

November 26, 2011
By
int64: 64 bit integer vectors for R

The Google Open Source Programs Office sponsored me to create the new int64 package that has been released to CRAN a few days ago. The package has been mentionned in an article in the open source blog from Google. The package defines classes i...

Read more »

The Global Earthquake Desktop

November 25, 2011
By
The Global Earthquake Desktop

One of the first things I do over coffee each morning is scroll through the USGS earthquake RSS feeds.  In the era of free data and open source computing I asked myself, "Wouldn't it be better to visualize all of the earthquakes around the world r...

Read more »

..Some More Regex Examples Added to Collection

November 25, 2011
By
..Some More Regex Examples Added to Collection

Find the below examples added to my list of regex-examples HERE. Read more »

Read more »

Working with isTRUE

November 25, 2011
By
Working with isTRUE

This week I was running computations transforming some input files into output files. The problem was that it was a repeated process. If new input files were generated or old ones were updated I needed to calculate new output files. The transformation ...

Read more »

ConPA uses cloudnumbers.com as calculation backend

November 25, 2011
By
ConPA uses cloudnumbers.com as calculation backend

ConPA is an asset allocation application using the classic Markowitz approach. For the calculations the open-source statistical programming language R is used. R scripts are executed on cloudnumbers.com’s computer clusters in the Cloud and the results are displayed by ConPA frontend. ConPA allows to set the investment date of the portfolio, the target return and

Read more »

Sending Email from R (using sendEmail)

November 25, 2011
By
Sending Email from R (using sendEmail)

Like a lot of other R users I’ve felt the need for sending email from R. I haven’t surveyed CRAN for such a package but looked for the possibility of sending command line email in Windows. Found a nice application called sendEmail that can be found here Below are code snippets in R that will

Read more »

Pseudo-Random vs. Random Numbers in R

November 25, 2011
By
Pseudo-Random vs. Random Numbers in R

Earlier, I found an interesting post from Bo Allen on pseudo-random vs random numbers, where the author uses a simple bitmap (heat map) to show that the rand function in PHP has a systematic pattern and compares these to truly random numbers obtained from random.org. The post’s results suggest that pseudo-randomness in

Read more »

Pseudo-Random vs. Random Numbers in R

November 25, 2011
By
Pseudo-Random vs. Random Numbers in R

Happy Thanksgiving, everyone. Earlier today, I found an interesting post from Bo Allen on pseudo-random vs random numbers, where the author uses a simple bitmap (heat map) to show that the rand function in PHP has a systematic pattern and compares these to truly random numbers obtained from random.org. The post’s results suggest that pseudo-randomness in PHP is

Read more »

Introduction to Backtesting library in the Systematic Investor Toolbox

November 24, 2011
By
Introduction to Backtesting library in the Systematic Investor Toolbox

I wrote a simple Backtesting library to evaluate and analyze Trading Strategies. I will use this library to present the performance of trading strategies that I will study in the next series of posts. It is very easy to write a simple Backtesting routine in R, for example: The code I implemented in the Systematic

Read more »

Matrix Package Doodling

November 24, 2011
By

Trying not to fall into Thanksgiving Day, football, coma.  So I started looking at the Matrix package.Started out by changing my code from before to create a matrix using the Matrix() function from the Matrix package.n = 4000c = Matrix(.9,n,n)for(...

Read more »

Matrix Package Doodling

November 24, 2011
By
Matrix Package Doodling

Trying not to fall into Thanksgiving Day, football, coma.  So I started looking at the Matrix package.Started out by changing my code from before to create a matrix using the Matrix() function from the Matrix package.n = 4000c = Matrix(.9,n,n)for(...

Read more »

bounded normal mean

November 24, 2011
By
bounded normal mean

A few days ago, one of my students, Jacopo Primavera (from La Sapienza, Roma) presented his “reading the classic” paper, namely the terrific bounded normal mean paper by my friends George Casella and Bill Strawderman (1981, Annals of Statistics). Even though I knew this paper quite well, having read (and studied) it myself many times,

Read more »

A Function for Adding up Matrices with Different Dimensions

November 24, 2011
By
A Function for Adding up Matrices with Different Dimensions

I was unlucky finding a function that can handle matrices with different dimensions. Thus, I coded a little function that sums up matrices, also coping with matrices with different dimensions.Read more »

Read more »

Book shoppin’…

November 24, 2011
By
Book shoppin’…

I honestly have no book on R programming. In fact I have not a single book on programming at all (my coding proves that ;x). I am pretty sure that I am gonna order (just did!) that book. You can get a look of Matloff’s text here (= pdf for ya)

Read more »

Book shoppin’…

November 24, 2011
By
Book shoppin’…

I honestly have no book on R programming. In fact I have not a single book on programming at all (my coding proves that ;x). I am pretty sure that I am gonna order (just did!) that book. You can get a look of Matloff’s text here (= pdf for ya)

Read more »

Bringing 64-bit data to R

November 24, 2011
By
Bringing 64-bit data to R

The R programming language has become one of the standard tools for statistical data analysis and visualization, and is widely used by Google and many others. The language includes extensive support for working with vectors of integers, numerics (doubles), and many other types, but has lacked support for 64-bit integers. ...

Read more »

“Home Runs by Park – 2011 Season” or “Man the Astros Sucked This Year”

November 24, 2011
By
“Home Runs by Park – 2011 Season” or “Man the Astros Sucked This Year”

I hate the Giants. Let this be known. What i was hoping to find was another reason to support my claim that their WS win in 2010 was a complete fluke.  So when digging through the game logs for the … Continue reading →

Read more »

Happy Thanksgiving!

November 24, 2011
By

It's Thanksgiving day here in the US: > library(timeDate) > holiday(2011,"USThanksgivingDay") GMT So we're taking a little break today here at Revolutions. We have a special "Because it's Friday" post queued up for tomorrow, and then we'll be back to the usual schedule on Monday. For readers in the US, enjoy the Thanksgiving holiday!

Read more »

Define intermediate color steps for colorRampPalette

November 24, 2011
By
Define intermediate color steps for colorRampPalette

The following function, color.palette(), is a wrapper for colorRampPalette() and allows some increased flexibility in defining the spacing between main color levels. One defines both the main color levels (as with colorRampPalette) and an optional vector containing the number of color levels that should be put in between at equal distances.     The above...

Read more »

Empirical Orthogonal Function (EOF) Analysis for gappy data

November 24, 2011
By
Empirical Orthogonal Function (EOF) Analysis for gappy data

The following is a function for the calculation of Empirical Orthogonal Functions (EOF). For those coming from a more biologically-oriented background and are familiar with Principal Component Analysis (PCA), the methods are similar. In the climate sciences the method is usually used for the decomposition of a data field into dominant spatial-temporal modes. Read...

Read more »

source_https(): Sourcing an R Script from github over HTTPS

November 24, 2011
By
source_https(): Sourcing an R Script from github over HTTPS

The Objective I wanted to source R scripts hosted on my github repository for use in my blog (i.e. a github version of ?source). This would make it easier for anyone wishing to test out my code snippets on their own computers without having to manually go to my github repo and retrieve a series of R

Read more »

If you are writing a book on Bayesian statistics

November 23, 2011
By

This post is somewhat marginal to R in that there are several statistical systems that could be used to tackle the problem. Bayesian statistics is one of those topics that I would like to understand better, much better, in fact. … Continue reading →

Read more »