## Maize trade Part II: Comparison and analysis

February 3, 2013
By

Following my last post about the maize network, although interesting but is not very informative. What we are going to do today is to contrast the maize network with the wine trade network.The choice why we have chose wine will become clear after the...

## Visualising 2012 NFL Quarterback performance with R heat maps

February 3, 2013
By

With only 24 hours remaining in the 2012 NFL season, this is a good time to review how the league's QBs performed during the regular season using performance data from KFFL and the heat mapping capabilities of R. #scale data to mean=0, sd=1 and convert to matrix QBscaled <- as.matrix(scale(QB2012)) #create heatmap and don't reorder

## InstallOldPackages: a repmis command for installing old R package versions

February 3, 2013
By

A big problem in reproducible research is that software changes. The code you used to do a piece of research may depend on a specific version of software that has since been changed. This is an annoying problem in R because install.packages only installs the most recent version of a package. It can be tedious to collect the old...

## Comparing individual team run production

February 3, 2013
By

Or, The 2010 Mariners: How Bad Were They?In earlier posts, I used the statistical software R to plot the trends in league average run scoring since 1901. This was the first step to answering other questions I had on my mind:How poor was the offensive performance of the 2010 Seattle Mariners?Are they showing any signs...

## data.table or data.frame?

February 2, 2013
By

I spent a portion of today trying to convince a colleague that there are times when the data.table package is faster than traditional methods in R. It took a few of the tests below to prove the point. Generate a data.frame of characters and numbers for easy plotting. df <- data.frame(letters = as.character(sample(letters, 1e+08, replace = TRUE)), ...

## A random walk ? What else ?

February 2, 2013
By

Consider the following time series, What does it look like ? I know, this is a stupid game, but I keep using it in my time series courses. It does look like a random walk, doesn’t it ? If we use Philipps-Perron test, yes, it does, > PP.test(x) Phillips-Perron Unit Root Test data: x Dickey-Fuller = -2.2421, Truncation lag parameter = 6,...

## R scripts for analyzing survey data

February 2, 2013
By

Another site pops up with open code for analyzing public survey data: http://www.asdfree.com/ It will be interesting to see whether this gets used by the general public--given the growing trend of data journalism and so forth--versus academics. It is...

## A slightly different introduction to R, part III

February 2, 2013
By

I think you’ve noticed by now that a normal interactive R session is quite messy. If you don’t believe me, try playing around for a while and then give the history() command, which will show you the commands you’ve typed. If you’re anything like me, a lot of them are malformed attempts that generated some

## qdap 0.2.0 released

February 2, 2013
By

This is the first CRAN release of qdap (qdap 0.2.0) found here.  qdap (Quantitative Discourse Analysis Package) is an R package designed to assist in quantitative discourse analysis. The package stands as a bridge between qualitative transcripts of dialogue and … Continue reading →

## RcppExamples 0.1.6

February 1, 2013
By

A pure maintenance release 0.1.6 of RcppExamples was made two weeks ago, and never announced. We merely moved the NEWS.Rd file into the proper location in the inst/ directory, and, while were at it, mentioned the new Rcpp Gallery in the DESCRIPTION fi...

## digest 0.6.2

February 1, 2013
By

digest version 0.6.2 came out a few days ago as an almost immediate follow-up to release 0.6.1. We used paste0() in a few places, and this is only available with newer versions of R. To not introduce as somewhat unnessecary dependency, we reverted thi...

## Bootstrap Confidence Intervals

February 1, 2013
By

Here is an example of nonparametric bootstrapping.  It’s a powerful technique that is similar to the Jackknife. With the bootstrap, however, the approach uses re-sampling. It’s clearly not as good as parametric approaches but it gets the job done. This can be used in a variety of situations ranging from variance estimation to model selection. John

## Visualizing MLB Hall of Fame votes with R

February 1, 2013
By

Carlos Scheidegger and Kenny Shirley created this visualization of votes for the Major League Baseball hall of fame: They describe the chart as follows: The main figure above is a plot of BBWAA Hall of Fame voting by year for all 1,070 players who have appeared on the ballot since Hall of Fame voting began in 1936. The circular...

February 1, 2013
By

So you’ve finally managed to install the pesky environment but have no idea what you are looking at when you open the program. This tutorial is for you. (Again, here is a version with screenshot pictures). When you open R, it might look different than the screenshots in the picture version of the tutorial. This

## Yen and JGBs Short-Term vs Long Term

February 1, 2013
By

I have read some articles arguing that the recent move in the Japanese Yen is overdone.  However, considering the short-term without regard to the long-term context is naïve and potentially dangerous.  Although I do not have significant proo...

## Overdispersion with different exposures

February 1, 2013
By

In actuarial science, and insurance ratemaking, taking into account the exposure can be a nightmare (in datasets, some clients have been here for a few years – we call that exposure – while others have been here for a few months, or weeks). Somehow, simple results because more complicated to compute just because we have to take into account...

## Bayesian model choice for the Poisson model

February 1, 2013
By
$Bayesian model choice for the Poisson model$

Following Arthur Charpentier‘s example, I am going to try to post occasionally on material covered during my courses, in the hope that it might be useful to my students, but also to others. In the second practical of the Bayesian Case Studies course, we looked at Bayesian model choice and basic Monte Carlo methods, looking

## #13 Mapping in R: Representing geospatial data together with ggplot

February 1, 2013
By

I have been trawling around for a while now trying to find a simple and understandable way of representing geospatial data in R, whilst retaining the ability to manipulate the visualisation in ggplot. After much searching I came across some articles which got me to a working product only after a lot of ball ache.

## "I don’t wanna grow up": Age / value relationships for football players

February 1, 2013
By

Let's get back to the age-value relationship from my last post. I did some more plotting to see on which position this inversed U-shaped relationship is strongest. Please note, that I use a dataframe called eu.players throughout this post, which holds downloaded football player information from transfermarkt.de.But first, let us get back to the original graph.

## Converting a dataset from wide to long

February 1, 2013
By

I recently had to convert a dataset that I was working with from a wide format to a long format for my analysis.  I struggled with this a bit, but finally found the right sources and the right package to do it, so I thought I'd share my practical ...

## Show me the pdf already

February 1, 2013
By

You’ve got a pdf file and you’d like to view it with whatever the system viewer is. As usual, that requires something special for Windows and something general for the rest of us. Here goes… openPDF <- function(f) { os <- .Platform\$OS.type if (os=="windows") shell.exec(normalizePath(f)) else { pdf <- getOption("pdfviewer", default='') if (nchar(pdf)==0) stop("The 'pdfviewer'

## Introducing the BH package

January 31, 2013
By

Earlier today a new package BH arrived on CRAN. Over the years, Jay Emerson, Michael Kane and I had numerous discussions about a basic Boost infrastructure package providing Boost headers for other CRAN packages (and yes, we are talking packages usin...

## Flowchart: How to learn survey analysis with R

January 31, 2013
By

In a recent talk to the DC R User Group, Anthony Damico presented the following handy flowchart for learning to do survey analysis with R (actually, it's a pretty good flowchart for learning R for any application): Since they're not clickable above, here are the resource links: Learn R by watching two‐minute videos on http://twotorials.com Read the “Getting Started...

January 31, 2013
By

## Taking Expectations to the Next Level

January 31, 2013
By

Higher Expectations I came across this post on Thursday and found it to be quite interesting. Clearly rental prices vary according to where you live. That isn't too surprising. I started thinking a bit more about it and thought that Boston and the nearby communities would have to...

## Using R: writing a table with odd lines (again)

January 31, 2013
By

Let’s look at my gff track headers again. Why not do it with plyr instead? d_ply splits the data frame by the feature column and applies a nameless function that writes subsets to the file (and returns nothing, hence the ”_” in the name). This isn’t shorter or necessarily better, but it appeals to me.

## Using Line Segments to Compare Values in R

January 31, 2013
By

Sometimes you want to create a graph that will allow the viewer to see in one glance:The original value of a variableThe new value of the variableThe change between old and newOne method I like to use to do this is using geom_segment and geom_poin...

## Scatterplot Matrices

January 31, 2013
By

Scatterplot matrices are a great way to roughly determine if you have a linear correlation between multiple variables. This is particularly helpful in pinpointing specific variables that might have similar correlations to your genomic or proteomic data. If you already have data with multiple variables, load it up as described here. If not, no worries