## Introducing the BH package

January 31, 2013
Earlier today a new package BH arrived on CRAN. Over the years, Jay Emerson, Michael Kane and I had numerous discussions about a basic Boost infrastructure package providing Boost headers for other CRAN packages (and yes, we are talking packages usin...

## Flowchart: How to learn survey analysis with R

January 31, 2013
In a recent talk to the DC R User Group, Anthony Damico presented the following handy flowchart for learning to do survey analysis with R (actually, it's a pretty good flowchart for learning R for any application): Since they're not clickable above, here are the resource links: Learn R by watching two‐minute videos on http://twotorials.com Read the “Getting Started...

January 31, 2013
## Taking Expectations to the Next Level

January 31, 2013
Higher Expectations I came across this post on Thursday and found it to be quite interesting. Clearly rental prices vary according to where you live. That isn't too surprising. I started thinking a bit more about it and thought that Boston and the nearby communities would have to...

## Using R: writing a table with odd lines (again)

January 31, 2013
Let’s look at my gff track headers again. Why not do it with plyr instead? d_ply splits the data frame by the feature column and applies a nameless function that writes subsets to the file (and returns nothing, hence the ”_” in the name). This isn’t shorter or necessarily better, but it appeals to me.

## Using Line Segments to Compare Values in R

January 31, 2013
Sometimes you want to create a graph that will allow the viewer to see in one glance:The original value of a variableThe new value of the variableThe change between old and newOne method I like to use to do this is using geom_segment and geom_poin...

## Scatterplot Matrices

January 31, 2013
Scatterplot matrices are a great way to roughly determine if you have a linear correlation between multiple variables. This is particularly helpful in pinpointing specific variables that might have similar correlations to your genomic or proteomic data. If you already have data with multiple variables, load it up as described here. If not, no worries

## How to install packages on R + screenshots

January 31, 2013
Have no fear, the screenshots are here! (For the original tutorial, click here) Method 1 (less typing) Part 1-Getting the Package onto Your Computer Open R via  your preferred method (icon on desktop, Start Menu, dock, etc.) Click “Packages” in the top menu then click “Install package(s)”.  Choose a mirror that is closest to your geographical location. Now

## Soup up your R environment: how to install packages

January 31, 2013
Today we are going to make additions to our R environment in a common process called installing packages. The transition won’t be as long, drastic nor emotional as an episode of Extreme Makeover: Home Edition, but it does add on more capabilities to your R environment. A package is a bunch of codes combined and distributed

## Using SPARQL Query Libraries to Generate Simple Linked Data API Wrappers

January 31, 2013
A handful of open Linked Data have appeared through my feeds in the last couple of days, including (via RBloggers) SPARQL with R in less than 5 minutes, which shows how to query US data.gov Linked Data and then Leigh Dodds’ Brief Review of the Land Registry Linked Data. I was going to post a

## Sorting Numeric Vectors in C++ and R

January 31, 2013
Consider the problem to sort all elements of the given vector in ascending order. We can simply use the function std::sort from the C++ STL. #include <Rcpp.h> using namespace Rcpp; // ] NumericVector stl_sort(NumericVector x) { NumericVector y = clone(x); std::sort(y.begin(), y.end()); return y; } library(rbenchmark) set.seed(123) z <- rnorm(100000) x <- rnorm(100) # check that stl_sort is the same as sort stopifnot(all.equal(stl_sort(x), sort(x))) #...

## Using Boost via the new BH package

January 31, 2013
Earlier today the new BH package arrived on CRAN. Over the years, Jay Emerson, Michael Kane and I had numerous discussions about a basic Boost infrastructure package providing Boost headers for other CRAN packages. JJ and Romain chipped in as well, and Jay finally took the lead by first creating a repo on...

## repmis: misc. tools for reproducible research in R

January 30, 2013
I've started to put together an R package called repmis. It has miscellaneous tools for reproducible research with R. The idea behind the package is to collate commands that simplify some of the common R code used within knitr-type reproducible research papers. It's still very much in the early stages of development and has two commands: LoadandCite:...

## R installation + screenshots

January 30, 2013
Feeling faint of heart without photos depicting what to do? No worries, here they are. Go to the R website and click “Download R” under “Getting Started” Choose a place to download R. Even though we’re on the limitless and borderless interweb, choosing a location close to you helps speeds things up. Choose which R package to download based

## R users: Be counted in Rexer’s 2013 Data Miner Survey

January 30, 2013
Since 2007, Rexer Analytics has been conducting periodic surveys to measure the analytic behaviors, views and preferences of data miners and analytic professionals. In the last survey, conducted in 2011, more than 1300 analysts shared information about the data analysis software packages they use. (The results of all Rexer surveys are available free to anyone who requests them.) In...

January 30, 2013
A new Armadillo version 3.6.2 came out yesterday, and the corresponding RcppArmadillo version is now on CRAN. Changes are mostky incremental: Changes in RcppArmadillo version 0.3.6.2 (2013-01-29) Upgraded to Armadillo release Version 3.6.2 ...

January 30, 2013
A Problem A major problem in secondary data analysis is that you didn't get to decide what data was collected. Lets say you were interested in how many times a student has read the Twilight books). Specifically, you want to know how effective the ads for...

## F1Stats – Visually Comparing Qualifying and Grid Positions with Race Classification

January 30, 2013
Following the roundabout tour of F1Stats – A Prequel to Getting Started With Rank Correlations, here’s a walk through of my attempt to replicate the first part of A Tale of Two

## R finals

January 30, 2013
On the morning I returned from Varanasi and the ISBA meeting there, I had to give my R final exam (along with three of my colleagues in Paris-Dauphine). This year, the R course was completely in English, exam included, which means I can post it here as it may attract more interest than the French

## Modeling Residential Electricity Usage with R

January 30, 2013
Wow, I can’t believe it has been 11 months since my last blog posting!  The next series of postings will be related to the retail energy field.  Residential power usage is satisfying to model as it can be forecast fairly accurately with the right inputs.  Partly as a consequence of deregulation there is now more data more available than...

## Regression on categorical variables

January 30, 2013
$N_{x,t}\sim\mathcal{P}(E_{x,t}\cdot \exp[\alpha_x+\beta_x \kappa_t + \gamma_x \delta_{t-x}])$

This morning, Stéphane asked me tricky question about extracting coefficients from a regression with categorical explanatory variates. More precisely, he asked me if it was possible to store the coefficients in a nice table, with information on the variable and the modality (those two information being in two different columns). Here is some code I did to produce the...

## Approaching the Zero Bound – Bonds

January 30, 2013
As bonds approach the artificial zero bound, where do we go next especially after the record setting +30% in 2011?  The rolling 250-day total return has rarely gone negative since the inception of the Vanguard Funds VBMFX and VUSTX.  I am int...

## The magic empty bracket

January 30, 2013
$The magic empty bracket$

I have been working with R for some time now, but once in a while, basic functions catch my eye that I was not aware of… For some project I wanted to transform a correlation matrix into a covariance matrix. Now, since cor2cov does not exist, I thought about “reversing” the cov2cor function (stats:::cov2cor). Inside

## Speed up for loops in R

January 30, 2013
Are your for loops too slow in R ? Are loops that should take seconds actually taking hours ? As I found out recently, how you structure your code can make a huge difference in execution times. Fortunately making a few small changes to your code can speed up these loops by several orders of

## R’s range and loop behaviour: Zero, One, NULL

January 30, 2013
One of the most common pattern in programming languages is to ability to iterate over a given set (a vector usually) by using 'for' loops. In most modern scripting languages range operations is a build in data structure and trivial to use with 'for' lo...

## Building a package in RStudio is actually very easy

January 30, 2013
So, you’ve written some code and you use it routinely. Now you’ve written some code and you’d like to use version control to ensure that development continues in a robust fashion. You do that and you use Github or something so that not only are changes tracked, but the general public receives the benefit of

## The three-dots construct in R

January 30, 2013
There is a mechanism that allows variability in the arguments given to R functions.  Technically it is ellipsis, but more commonly called “…”, dots, dot-dot-dot or three-dots. Basics The three-dots allows: an arbitrary number and variety of arguments passing arguments on to other functions Arbitrary arguments The two prime cases are the c and list The post The...

## A shiny app to display the human body map dataset

January 30, 2013
There was quite a lot of buzz around when the guys from Rstudio launched Shiny, a new web framework for R that promises to “make it super simple for R users like you to turn analyses into interactive web applications … Continue reading →

## Using Boost’s foreach macro

January 30, 2013
Boost provides a macro, BOOST_FOREACH, that allows us to easily iterate over elements in a container, similar to what we might do in R with sapply. In particular, it frees us from having to deal with iterators as we do with std::for_each and std::transform. The macro is also compatible with the objects exposed by Rcpp. Side note: C++11 has introduced...