The rbinding race: for vs. do.call vs. rbind.fill

May 14, 2013
By
The rbinding race: for vs. do.call vs. rbind.fill

Which function rbinds dataframes together fastest?First competitor: classic rbind in a for loop over a list of dataframesSecond competitor: do.call("rbind", <list of dataframes>)Third competitor: rbind.fill(<list of dataframes>) f...

Read more »

Visualizing your websites’ ecommerce performance with R

May 14, 2013
By
Visualizing your websites’ ecommerce performance with R

In this blogpost, I want to dive deeper into the explanation of the relationship between Frequency and Recency of Visits with the Conversion Rate and Average Order Value. I have used the RGA package for data extraction and Dr. Hadley Wickham’s ggplot2 package to achieve the visualizations. Here’s the data aggregation script : #transactions dataframe

Read more »

Claims Inflation – a known unknown

May 14, 2013
By

Over the last year I worked with two colleagues of mine on the subject of inflation and claims inflation in particular. I didn't expect it to be such a challenging topic, but we ended up with more questions than answers. The key question and biggest ch...

Read more »

RcppArmadillo 0.3.820

Conrad rolled up a new Armadillo release 3.820 (following two minor fix release in the 0.3.810 series of which we packaged the one that was relevant for us). This new version is now out in a release 0.3.820 of RcppArmadillo which is already on CRAN a...

Read more »

Stan!

May 13, 2013
By

Guy Freeman writes: I thought you’d all like to know that Stan was used and referenced in a peer-reviewed Rapid Communications paper on influenza. Thank you for this excellent modelling language and sampler, which made it possible to carry out this work quickly! I haven’t actually read the paper, but I’m happy to see Stan The post Stan!...

Read more »

Rによるモンテカルロ法入門

May 13, 2013
By
Rによるモンテカルロ法入門

Here is the cover of the Japanese translation of our Introducing Monte Carlo methods with R book.  A few year after the French translation. It actually appeared last year in August but I was not informed of this till a few weeks ago. The publisher is Maruzen, with an associated webpage if you want to

Read more »

Integration take two – Shiny application

May 13, 2013
By
Integration take two – Shiny application

My last post discussed a technique for integrating functions in R using a Monte Carlo or randomization approach. The mc.int function (available here) estimated the area underneath a curve by multiplying the proportion of random points below the curve by the total area covered by points within the interval: The estimated integration (bottom plot) is

Read more »

In case you missed it: April 2013 Roundup

May 13, 2013
By

In case you missed them, here are some articles from April of particular interest to R users: A critique of a SAS whitepaper comparing the performance of SAS, R and Mahout. A video presentation from statistician Tess Nesbitt at UpStream, who uses GAM survival models in R for marketing attribution analysis. The April edition of the Revolution Analytics newsletter....

Read more »

Shiny App for CRAN packages

May 13, 2013
By

Over the past few days, I have been introduced to a few new-to-me R packages, via some comments from the Shiny guys and the R-bloggers site. This seems a rather haphazard way of acquiring knowledge and I cannot be alone in thinking that this is not the most productive way to become aware of new/better

Read more »

Stack Exchange: Why I dropped out

May 13, 2013
By
Stack Exchange: Why I dropped out

Stack Exchange is a series of question-and-answer sites, including Stack Overflow for programming and Cross Validated for statistics. I was introduced to these sites at a short talk by Barry Rowlingson at the 2011 UseR! meeting, “Why R-help must die!“ These sites have a lot of advantages over R-help: The format is easier to read,

Read more »

Global Indicator Analyses with R

May 13, 2013
By
Global Indicator Analyses with R

I was recently asked by a client to create a large number of “proof of concept” visualizations that illustrated the power of R for compiling and analyzing disparate datasets. The client was specifically interested in automated analyses of global data. A little research led me to the WDI package. The WDI package is a tool to “search, extract and...

Read more »

Global Indicator Analyses with R

May 13, 2013
By
Global Indicator Analyses with R

I was recently asked by a client to create a large number of “proof of concept” visualizations that illustrated the power of R for compiling and analyzing disparate datasets. The client was specifically interested in automated analyses of global data. A little research led me to the WDI package. The WDI package is a tool to “search, extract and...

Read more »

Global Indicator Analyses with R

May 13, 2013
By
Global Indicator Analyses with R

I was recently asked by a client to create a large number of “proof of concept” visualizations that illustrated the power of R for compiling and analyzing disparate datasets. The client was specifically interested in automated analyses of global data. A little research led me to the WDI package. The WDI package is a tool The post Global...

Read more »

Living it up with computational errors

May 13, 2013
By

How to have a better chance of a good outcome. Making mistakes There’s been a lot of talk recently about data analysis problems with spreadsheets.  If you’ve not stuck your head out of your cave lately, then you can catch some of the discussion by doing an internet search for: Reinhart Rogoff There are several The post Living...

Read more »

Combining dataframes when the columns don’t match

May 13, 2013
By
Combining dataframes when the columns don’t match

Most of my work recently has involved downloading large datasets of species occurrences from online databases and attempting to smoodge1 them together to create distribution maps for parts of Australia. Online databases typically have a ridiculous number of columns with … Continue reading →

Read more »

Who Has the Best Fantasy Football Projections: ESPN, CBS, NFL.com, or FantasyPros?

May 12, 2013
By
Who Has the Best Fantasy Football Projections: ESPN, CBS, NFL.com, or FantasyPros?

In prior posts, I demonstrated how to download, calculate, and compare fantasy football projections from ESPN, CBS, and NFL.com.  In my last post, I demonstrated how to download FantasyPros projections, which aggregate projections from many different sources to increase prediction accuracy.  In this post, I will compare fantasy football projections from ESPN, CBS, NFL, and FantasyPros, including our average...

Read more »

The Guerilla Guide to R

May 12, 2013
By

Update: Okay. I've uploaded a new template and things seem to be fine now. Update: I am aware the table of contents is not being displayed in bullet form as I intended. The web template I'm using seems to be buggy. It also seems to think this page is in Indonesian...Working on it! Table of Contents: Reading/Writing Files

Read more »

Recent Rcpp talks at U of C and MCW

A couple of days ago, I had an opportunity to give a guest lecture on our Rcpp package for R and C++ integration. This was in CMSC 12300 Computer Science with Applications-3 in the Department of Computer Science at University of Chicago. The course i...

Read more »

awalé

May 12, 2013
By
awalé

Following Le Monde puzzle #810, I tried to code an R program (not reproduced here) to optimise an awalé game but the recursion was too rich for R: even with a very small number of holes and seeds in the awalé… Searching on the internet, it seems the computer simulation of a winning strategy for

Read more »

Using C libraries in R with rdyncall

May 12, 2013
By
Using C libraries in R with rdyncall

One reason I like using R for data analysis is that R has a great collection of packages that let you easily apply state-of-the-art methods to your problems. But once in a while you find a library that you would like to use that does not have a R wrapper, yet. While the great Rcpp

Read more »

A new package : Quandl

May 12, 2013
By
A new package : Quandl

Quandl is a new database management tool which seeks to become the place to find datasets. That is, each unique indicator is considered an independent data set. This helps them to seem to have a ginormous quantity of data sets. Source : Blog Econo...

Read more »

Reshaping data

May 12, 2013
By

Preparing and reshaping data is the ever continuing task of a data analyst. Luckily we have many tools for it. The default tool in R would be reshape(), although this is so user friendly that a reshape package has been added too. I try to use reshape()...

Read more »

Playing cards, with R

May 11, 2013
By
Playing cards, with R

In my courses on R, I usually show how to insert a picture as a background for a graph. But it is also to see the picture as an object, and to insert it in a graph everywhere we like to see it, as explained on the awesome blog http://rsnippets.blogspot.ca/…. (in a post published in January 2012). I wanted...

Read more »

Animations Understood

May 11, 2013
By
Animations Understood

When I first saw a graphic made from Yihui’s animation package (Xie, 2013) I was amazed at the magic and thought “I could never do that”. Passage of time… One night I found myself bored and as usual avoiding work. … Continue reading →

Read more »

Reproducibility and randomness

May 11, 2013
By
Reproducibility and randomness

With Stéphane Tufféry, we were working this week on a chapter of a book, entitled Statistical Learning in Actuarial Science. The chapter should be based on R functions, and we wanted to reproduce some outputs he previously obtained with SAS. The good thing is that even complex functions (logistic regression, regression trees, etc) produce the same kind of outputs....

Read more »

Veterinary Epidemiologic Research: Count and Rate Data – Poisson Regression and Risk Ratios

May 10, 2013
By
Veterinary Epidemiologic Research: Count and Rate Data – Poisson Regression and Risk Ratios

As noted on paragraph 18.4.1 of the book Veterinary Epidemiologic Research, logistic regression is widely used for binary data, with the estimates reported as odds ratios (OR). If it’s appropriate for case-control studies, risk ratios (RR) are preferred for cohort studies as RR provides estimates of probabilities directly. Moreover, it is often forgotten the assumption

Read more »

Spatial Critter Swarming Simulation

May 10, 2013
By
Spatial Critter Swarming Simulation

# I am interested in how small bits of individualized instructions can create collective action.# In this simulation I will give a single instruction to each individual in the swarm.# Choose another individual who is not too close, then accelerate towards that individual.# I also control momentum causing the previous movement and direction to...

Read more »

A guide to speeding up R code

May 10, 2013
By

Noam Ross recently shared a very useful guide to speeding up your R code. Get a bigger computer (for example, renting an instance on the Amazon cloud for a few cents an hour) Use parallel programming techniques Using the R byte-compiler Profiling and benchmarking your code Using high-performance packages (like xts, for time series) And lastly, rewriting your code...

Read more »

Tutorials on git/github and GNU make

May 10, 2013
By
Tutorials on git/github and GNU make

If you’re not using version control, you should be. Learn git. If you’re not on github, you should be. That’s real open source. To help some colleagues get started with git and github, I wrote a minimal tutorial. There are lots of git and github resources available, but I thought I’d give just the bare

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de









ODSC

CRC R books series











Contact us if you wish to help support R-bloggers, and place your banner here.