Using maps and ggplot2 to visualize college hockey championships

March 13, 2013
By
Using maps and ggplot2 to visualize college hockey championships

Short: I plot the frequency of college hockey championships by state using the maps package, and ggplot2 Note: this example is based heavily on the example provided athttp://www.dataincolour.com/2011/07/maps-with-ggplot2/ data reference:http://en.wikipedia.org/wiki/NCAA_Men%27s_Ice_Hockey_Championship Question of interestAs a good Minnesotan, I've believed for quite some time that the colder, Northern states enjoy a competitive advantage when it...

Read more »

Webinar tomorrow: 100% R and More

March 13, 2013
By

A quick note that I'll be hosting our regularly-scheduled webinar, Revolution R Enterprise, 100% R and More, at 10AM Pacific tomorrow. If you're new to R, or want to learn about the power, scalability and productivity features of Revolution R Enterprise, this is a great place to start. Revolution Analytics webinars: Revolution R Enterprise, 100% R and More

Read more »

New package for ensembling R models

March 13, 2013
By
New package for ensembling R models

I've written a new R package called caretEnsemble for creating ensembles of caret models in R.  It currently works well for regression models, and I've written some preliminary support for binary classification models. At th...

Read more »

R needs some bureaucracy

March 12, 2013
By

Writing a program in R is almost bureaucracy free: variables don’t need to be declared, the language does a reasonable job of guessing the type a value might need to be automatically be converted to, there is no need to create a function having a special name that gets called at program startup, the commonly

Read more »

Rcpp master class in New York last weekend

March 12, 2013
By

On Saturday I had the opportunity to teach another one-day master class on Rcpp. The class had been organized by Jared Lander, and organized very well I might add. The weekend started with a slight disappointment. I had taken Friday off, and hoped t...

Read more »

RcppArmadillo 0.3.800.1

March 12, 2013
By

Conrad released a first bug-fix release 3.800.1 of Armadillo earlier today. This has been wrapped up in release 0.3.800.1 of RcppArmadillo as usual. This release also contains a very nice function sample() (contributed by Christian Gunning) which p...

Read more »

A map of worldwide email traffic, created with R

March 12, 2013
By
A map of worldwide email traffic, created with R

The Washing Post reports that by analyzing more than 10 million emails sent through the Yahoo! Mail service in 2012, a team of researchers used the R language to create a map of countries whose citizens email each other most frequently: The chart above shows the top 1000 country-country pairs by email frequency, arranged in a clustered network using...

Read more »

Distrust of R

March 12, 2013
By

I guess I've been living in a bubble for a bit, but apparently there are a lot of people who still mistrust R. I got asked this week why I used R (and, specifically, the package rpart) to generate classification and regression trees instead of SAS Ente...

Read more »

R to Latex packages: Coverage

March 12, 2013
By

There are now quite a few R packages to turn cross-tables and fitted models into nicely formatted latex. In a previous post I showed how to use one of them to display regression tables on the fly. In this post I summarise what types of R object each of the major packages can deal with.

Read more »

AQP News and Updates

March 12, 2013
By

The AQP family of R packages has seen a lot of development over the last 3 months. Some of the highlights include: HTML manual pages with syntax-highlighting and figures, c/o knitr new vignettes: "dealing with bad data", gridded SSURGO (gSSURGO) demo,...

Read more »

Job advert

March 12, 2013
By
Job advert

We finally got around to prepare everything we needed to advertise the position which will be available in the MRC grant we've been awarded last year.The project will run for 30 months and we're looking for a post-doctoral candidate to work on the Rese...

Read more »

reports 0.1.2 Released

March 12, 2013
By
reports 0.1.2 Released

I’m very pleased to announce the release of reports : An R package to assist in the workflow of writing academic articles and other reports. This is the first CRAN release of reports: http://cran.r-project.org/web/packages/reports/index.html The reports package assists in writing … Continue reading →

Read more »

Third Milano R net meeting to be held on April 18, 2013

March 12, 2013
By

Third Milano R net meeting April 18, 2013 @ 6.00 PM Fiori Oscuri Bistrot & Bar Via Fiori Oscuri, 3 Milano Further details will be published shortly. Stay connected!

Read more »

How to use optim in R

March 12, 2013
By
How to use optim in R

A friend of mine asked me the other day how she could use the function optim in R to fit data. Of course there are functions for fitting data in R and I wrote about this earlier. However, she wanted to understand how to do this from scratch using optim. The function optim provides algorithms for general...

Read more »

Generating a multivariate gaussian distribution using RcppArmadillo

March 12, 2013
By
Generating a multivariate gaussian distribution using RcppArmadillo

There are many ways to simulate a multivariate gaussian distribution assuming that you can simulate from independent univariate normal distributions. One of the most popular method is based on the Cholesky decomposition. Let’s see how Rcpp and Armadillo perform on this task. #include <RcppArmadillo.h> // ] using namespace Rcpp; // ] arma::mat mvrnormArma(int n, arma::vec mu, arma::mat sigma) { int ncols...

Read more »

Generating a multivariate gaussian distribution using RcppArmadillo

March 12, 2013
By
Generating a multivariate gaussian distribution using RcppArmadillo

There are many ways to simulate a multivariate gaussian distribution assuming that you can simulate from independent univariate normal distributions. One of the most popular method is based on the Cholesky decomposition. Let’s see how Rcpp and Armadillo perform on this task. #include <RcppArmadillo.h> // ] using namespace Rcpp; // ] arma::mat mvrnormArma(int n, arma::vec mu, arma::mat sigma) { int ncols...

Read more »

High Resolution Figures in R

March 12, 2013
By
High Resolution Figures in R

As I was recently preparing a manuscript for PLOS ONE, I realized the default resolution of R and RStudio images are insufficient for publication. PLOS ONE requires 300 ppi images in TIFF or EPS (encapsulated postscript) format. In R plots … Continue reading →

Read more »

R 101

March 11, 2013
By

as.character() is your friendas.character() is your friend Sometimes when you open a data file (lets say a .csv), variables will be recognized as factor whereas it should be numeric. It is therefore tempting to simply convert the variable to numeric using as.numeric(). Big mistake! If...

Read more »

Simulating Random Multivariate Correlated Data (Categorical Variables)

March 11, 2013
By
Simulating Random Multivariate Correlated Data (Categorical Variables)

This is a repost of the second part of an example that I posted last year but at the time I only had the PDF document (written in ). This is the second example to generate multivariate random associated data. This example shows how to generate ordinal, categorical, data. It is a little more complex than generating continuous

Read more »

Interview with Boulder BI Brain Trust

March 11, 2013
By

On Friday I traveled to Boulder, CO to update the Boulder BI Brain Trust on the latest news and updates from Revolution R Enterprise. While I was there, I was interviewed by BBBT president Claudia Imhoff. In a wide-ranging chat, we discussed: What's behind the Revolution Analytics momentum over the past year? How Business Intelligence relates to Data Science...

Read more »

Simulating Allele Counts in a population using R

March 11, 2013
By
Simulating Allele Counts in a population using R

This post is inspired by the Week 7 lectures of the Coursera course "Introduction to Genetics and Evolution" (I highly recommend this course for anyone interested in genetics, BTW.) Professor Noor uses a Univ Washington software called AlleleA1 for try...

Read more »

Reproducible Research at ENAR

March 11, 2013
By

I gave a talk at the Spring ENAR meetings this morning on some of the technical aspects of creating the book. The session was on reproducible research and the slides are here. I was dinged for not using git for version control (we used dropbox for simp...

Read more »

Lipsyncing for your life: a survival analysis of RuPaul’s Drag Race

March 11, 2013
By
Lipsyncing for your life: a survival analysis of RuPaul’s Drag Race

If you follow me on Twitter, you know that I’m a big fan of RuPaul’s Drag Race. The transformation, the glamour, the sheer eleganza extravanga is something my life needs to interrupt the monotony of grad school. I was able to catch up on nearly four seasons in a little less than a month, and I’ve been watching the… Continue reading →

Read more »

Veterinary Epidemiologic Research: Linear Regression Part 3 – Box-Cox and Matrix Representation

March 11, 2013
By
Veterinary Epidemiologic Research: Linear Regression Part 3 – Box-Cox and Matrix Representation

In the previous post, I forgot to show an example of Box-Cox transformation when there’s a lack of normality. The Box-Cox procedure computes values of which best “normalises” the errors. value Transformed value of Y 2 1 0.5 0 -0.5 -1 -2 For example: The plot indicates a log transformation. Matrix Representation We can use

Read more »

Simulating Random Multivariate Correlated Data (Continuous Variables)

March 11, 2013
By
Simulating Random Multivariate Correlated Data (Continuous Variables)

This is a repost of an example that I posted last year but at the time I only had the PDF document (written in ).  I’m reposting it directly into WordPress and I’m including the graphs. From time-to-time a researcher needs to develop a script or an application to collect and analyze data. They may also need

Read more »

Hexadecimal literals in GNU R

March 11, 2013
By

Recently I have used hexadecimal numbers in GNU R. The way they are parsed surprised me and is inconsistent with Java. As R Language Definition pdf only briefly mentions hexadecimal numbers here is what I have found.First I have checked the following c...

Read more »

FBit: GitHub repo for posts with R code for this blog

March 11, 2013
By
FBit: GitHub repo for posts with R code for this blog

This is a test post since I want to improve upon Jeffrey Horner’s strategy for posting R code in Tumblr. The only minor improvement I wanted to try out is hosting the images directly on the web. I mean, right now the images won’t show in RSS readers. I’m not doing anything new at all, just using the...

Read more »

Discovering Argon with the 2-Sample t-Test

Discovering Argon with the 2-Sample t-Test

I learned about Lord Rayleigh’s discovery of argon in my 2nd-year analytical chemistry class while reading “Quantitative Chemical Analysis” by Daniel Harris.  (William Ramsay was also responsible for this discovery.)  This is one of my favourite stories in chemistry; it illustrates how diligence in measurement can lead to an elegant and surprising discovery.  I find

Read more »

Is CTA trend following Dead?

March 10, 2013
By
Is CTA trend following Dead?

                                         This i...

Read more »

Sponsors