Maize trade Part II: Comparison and analysis

February 3, 2013
By
Maize trade Part II: Comparison and analysis

Following my last post about the maize network, although interesting but is not very informative. What we are going to do today is to contrast the maize network with the wine trade network.The choice why we have chose wine will become clear after the...

Read more »

Visualising 2012 NFL Quarterback performance with R heat maps

February 3, 2013
By
Visualising 2012 NFL Quarterback performance with R heat maps

With only 24 hours remaining in the 2012 NFL season, this is a good time to review how the league's QBs performed during the regular season using performance data from KFFL and the heat mapping capabilities of R. #scale data to mean=0, sd=1 and convert to matrix QBscaled <- as.matrix(scale(QB2012)) #create heatmap and don't reorder

Read more »

InstallOldPackages: a repmis command for installing old R package versions

February 3, 2013
By

A big problem in reproducible research is that software changes. The code you used to do a piece of research may depend on a specific version of software that has since been changed. This is an annoying problem in R because install.packages only installs the most recent version of a package. It can be tedious to collect the old...

Read more »

Comparing individual team run production

February 3, 2013
By

Or, The 2010 Mariners: How Bad Were They?In earlier posts, I used the statistical software R to plot the trends in league average run scoring since 1901. This was the first step to answering other questions I had on my mind:How poor was the offensive performance of the 2010 Seattle Mariners?Are they showing any signs...

Read more »

data.table or data.frame?

February 2, 2013
By

I spent a portion of today trying to convince a colleague that there are times when the data.table package is faster than traditional methods in R. It took a few of the tests below to prove the point. Generate a data.frame of characters and numbers for easy plotting. df <- data.frame(letters = as.character(sample(letters, 1e+08, replace = TRUE)), ...

Read more »

A random walk ? What else ?

February 2, 2013
By
A random walk ? What else ?

Consider the following time series, What does it look like ? I know, this is a stupid game, but I keep using it in my time series courses. It does look like a random walk, doesn’t it ? If we use Philipps-Perron test, yes, it does, > PP.test(x) Phillips-Perron Unit Root Test data: x Dickey-Fuller = -2.2421, Truncation lag parameter = 6,...

Read more »

R scripts for analyzing survey data

February 2, 2013
By

Another site pops up with open code for analyzing public survey data: http://www.asdfree.com/ It will be interesting to see whether this gets used by the general public--given the growing trend of data journalism and so forth--versus academics. It is...

Read more »

A slightly different introduction to R, part III

February 2, 2013
By
A slightly different introduction to R, part III

I think you’ve noticed by now that a normal interactive R session is quite messy. If you don’t believe me, try playing around for a while and then give the history() command, which will show you the commands you’ve typed. If you’re anything like me, a lot of them are malformed attempts that generated some

Read more »

qdap 0.2.0 released

February 2, 2013
By
qdap 0.2.0 released

This is the first CRAN release of qdap (qdap 0.2.0) found here.  qdap (Quantitative Discourse Analysis Package) is an R package designed to assist in quantitative discourse analysis. The package stands as a bridge between qualitative transcripts of dialogue and … Continue reading →

Read more »

RcppExamples 0.1.6

February 1, 2013
By

A pure maintenance release 0.1.6 of RcppExamples was made two weeks ago, and never announced. We merely moved the NEWS.Rd file into the proper location in the inst/ directory, and, while were at it, mentioned the new Rcpp Gallery in the DESCRIPTION fi...

Read more »

digest 0.6.2

February 1, 2013
By

digest version 0.6.2 came out a few days ago as an almost immediate follow-up to release 0.6.1. We used paste0() in a few places, and this is only available with newer versions of R. To not introduce as somewhat unnessecary dependency, we reverted thi...

Read more »

Bootstrap Confidence Intervals

February 1, 2013
By
Bootstrap Confidence Intervals

Here is an example of nonparametric bootstrapping.  It’s a powerful technique that is similar to the Jackknife. With the bootstrap, however, the approach uses re-sampling. It’s clearly not as good as parametric approaches but it gets the job done. This can be used in a variety of situations ranging from variance estimation to model selection. John

Read more »

Visualizing MLB Hall of Fame votes with R

February 1, 2013
By
Visualizing MLB Hall of Fame votes with R

Carlos Scheidegger and Kenny Shirley created this visualization of votes for the Major League Baseball hall of fame: They describe the chart as follows: The main figure above is a plot of BBWAA Hall of Fame voting by year for all 1,070 players who have appeared on the ballot since Hall of Fame voting began in 1936. The circular...

Read more »

General Navigation in R

February 1, 2013
By
General Navigation in R

So you’ve finally managed to install the pesky environment but have no idea what you are looking at when you open the program. This tutorial is for you. (Again, here is a version with screenshot pictures). When you open R, it might look different than the screenshots in the picture version of the tutorial. This

Read more »

Yen and JGBs Short-Term vs Long Term

February 1, 2013
By
Yen and JGBs Short-Term vs Long Term

I have read some articles arguing that the recent move in the Japanese Yen is overdone.  However, considering the short-term without regard to the long-term context is naïve and potentially dangerous.  Although I do not have significant proo...

Read more »

Overdispersion with different exposures

February 1, 2013
By
Overdispersion with different exposures

In actuarial science, and insurance ratemaking, taking into account the exposure can be a nightmare (in datasets, some clients have been here for a few years – we call that exposure – while others have been here for a few months, or weeks). Somehow, simple results because more complicated to compute just because we have to take into account...

Read more »

Bayesian model choice for the Poisson model

February 1, 2013
By
Bayesian model choice for the Poisson model

Following Arthur Charpentier‘s example, I am going to try to post occasionally on material covered during my courses, in the hope that it might be useful to my students, but also to others. In the second practical of the Bayesian Case Studies course, we looked at Bayesian model choice and basic Monte Carlo methods, looking

Read more »

#13 Mapping in R: Representing geospatial data together with ggplot

February 1, 2013
By
#13 Mapping in R: Representing geospatial data together with ggplot

I have been trawling around for a while now trying to find a simple and understandable way of representing geospatial data in R, whilst retaining the ability to manipulate the visualisation in ggplot. After much searching I came across some articles which got me to a working product only after a lot of ball ache.

Read more »

"I don’t wanna grow up": Age / value relationships for football players

February 1, 2013
By
"I don’t wanna grow up": Age / value relationships for football players

Let's get back to the age-value relationship from my last post. I did some more plotting to see on which position this inversed U-shaped relationship is strongest. Please note, that I use a dataframe called eu.players throughout this post, which holds downloaded football player information from transfermarkt.de.But first, let us get back to the original graph.

Read more »

Converting a dataset from wide to long

February 1, 2013
By
Converting a dataset from wide to long

I recently had to convert a dataset that I was working with from a wide format to a long format for my analysis.  I struggled with this a bit, but finally found the right sources and the right package to do it, so I thought I'd share my practical ...

Read more »

Show me the pdf already

February 1, 2013
By

You’ve got a pdf file and you’d like to view it with whatever the system viewer is. As usual, that requires something special for Windows and something general for the rest of us. Here goes… openPDF <- function(f) { os <- .Platform$OS.type if (os=="windows") shell.exec(normalizePath(f)) else { pdf <- getOption("pdfviewer", default='') if (nchar(pdf)==0) stop("The 'pdfviewer'

Read more »

Introducing the BH package

January 31, 2013
By

Earlier today a new package BH arrived on CRAN. Over the years, Jay Emerson, Michael Kane and I had numerous discussions about a basic Boost infrastructure package providing Boost headers for other CRAN packages (and yes, we are talking packages usin...

Read more »

Flowchart: How to learn survey analysis with R

January 31, 2013
By
Flowchart: How to learn survey analysis with R

In a recent talk to the DC R User Group, Anthony Damico presented the following handy flowchart for learning to do survey analysis with R (actually, it's a pretty good flowchart for learning R for any application): Since they're not clickable above, here are the resource links: Learn R by watching two‐minute videos on http://twotorials.com Read the “Getting Started...

Read more »

Data analysis approaches to modeling changes in primary metabolism

January 31, 2013
By
Data analysis approaches to modeling changes in primary metabolism

Read more »

Taking Expectations to the Next Level

January 31, 2013
By
Taking Expectations to the Next Level

Higher Expectations I came across this post on Thursday and found it to be quite interesting. Clearly rental prices vary according to where you live. That isn't too surprising. I started thinking a bit more about it and thought that Boston and the nearby communities would have to...

Read more »

Using R: writing a table with odd lines (again)

January 31, 2013
By
Using R: writing a table with odd lines (again)

Let’s look at my gff track headers again. Why not do it with plyr instead? d_ply splits the data frame by the feature column and applies a nameless function that writes subsets to the file (and returns nothing, hence the ”_” in the name). This isn’t shorter or necessarily better, but it appeals to me.

Read more »

Using Line Segments to Compare Values in R

January 31, 2013
By
Using Line Segments to Compare Values in R

Sometimes you want to create a graph that will allow the viewer to see in one glance:The original value of a variableThe new value of the variableThe change between old and newOne method I like to use to do this is using geom_segment and geom_poin...

Read more »

Scatterplot Matrices

January 31, 2013
By
Scatterplot Matrices

Scatterplot matrices are a great way to roughly determine if you have a linear correlation between multiple variables. This is particularly helpful in pinpointing specific variables that might have similar correlations to your genomic or proteomic data. If you already have data with multiple variables, load it up as described here. If not, no worries

Read more »

How to install packages on R + screenshots

January 31, 2013
By
How to install packages on R + screenshots

Have no fear, the screenshots are here! (For the original tutorial, click here) Method 1 (less typing) Part 1-Getting the Package onto Your Computer Open R via  your preferred method (icon on desktop, Start Menu, dock, etc.) Click “Packages” in the top menu then click “Install package(s)”.  Choose a mirror that is closest to your geographical location. Now

Read more »

Sponsors