## The Rcpp Gallery and my Seinfeld Streak

February 3, 2013
By

A good three weeks ago, we introduced the Rcpp Gallery. While this is a joint effort by several of us on the Rcpp team, the backend was conceived and implemented entirely by JJ who also bootstrapped it with same first content, drawing on posts by Ha...

## Checking validation statistics (Monitor function 030220139)

February 3, 2013
By

(This article was first published on NIR-Quimiometria, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometria. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL,...

## Clustering using dynamic tree cut

February 3, 2013
By

Summary: Two methods for hierarchical clustering are introduced: (i) dynamic tree cut; and (ii) dynamic hybrid cut. Dynamic tree cut is a top-down algorithm that relies solely on the dendrogram. The algorithm implements an adaptive, iterative process of cluster decomposition … Continue reading →

## Arc Diagrams in R: Les Miserables

February 3, 2013
By

In this post we will talk about the R package “arcdiagram” for plotting pretty arc diagrams like the one below: Arc Diagrams An arc diagram is a graphical display to visualize graphs or networks in a one-dimensional layout. The main idea is to display nodes along a single axis, while representing the edges or connections … Continue reading...

## XLConnect 0.2-4

February 3, 2013
By

Mirai Solutions GmbH (http://www.mirai-solutions.com) is very pleased to announce the release of XLConnect 0.2-4, which is available from CRAN. This newest release comes along with a number of new features: Ability to read cached cell values. There is a new … Continue reading →

## For descriptive statistics, values below LLOQ set to …

February 3, 2013
By

That is what I read the other day. For calculation of descriptive statistics, values below the LLOQ (lower limit of quantification)  were set to.... Then I wondered, wasn't there a trick in JAGS to incorporate the presence of missing data while es...

## A package for agricultural statistic: FAOSTAT

February 3, 2013
By

After 8 years of using R, today I finally become a contributor to the community and released my first package, FAOSTAT.The package is designed to provide user with direct access to the FAOSTAT data base via R and to support the...

## Scatterplot with marginal boxplots

February 3, 2013
By

Using R and ggplot2 to draw a scatterplot with the two marginal boxplotsDrawing a scatterplot with the marginal boxplots (or marginal histograms or marginal density plots) has always been a bit tricky (well for me anyway). The approach I take here is, first, to draw the three separate plots using ggplot2:the scatterplot;the horizontal boxplot to appear in the...

## Maize trade Part II: Comparison and analysis

February 3, 2013
By

Following my last post about the maize network, although interesting but is not very informative. What we are going to do today is to contrast the maize network with the wine trade network.The choice why we have chose wine will become clear after the...

## Visualising 2012 NFL Quarterback performance with R heat maps

February 3, 2013
By

With only 24 hours remaining in the 2012 NFL season, this is a good time to review how the league's QBs performed during the regular season using performance data from KFFL and the heat mapping capabilities of R. #scale data to mean=0, sd=1 and convert to matrix QBscaled <- as.matrix(scale(QB2012)) #create heatmap and don't reorder

## InstallOldPackages: a repmis command for installing old R package versions

February 3, 2013
By

A big problem in reproducible research is that software changes. The code you used to do a piece of research may depend on a specific version of software that has since been changed. This is an annoying problem in R because install.packages only installs the most recent version of a package. It can be tedious to collect the old...

## Comparing individual team run production

February 3, 2013
By

Or, The 2010 Mariners: How Bad Were They?In earlier posts, I used the statistical software R to plot the trends in league average run scoring since 1901. This was the first step to answering other questions I had on my mind:How poor was the offensive performance of the 2010 Seattle Mariners?Are they showing any signs...

## data.table or data.frame?

February 2, 2013
By

I spent a portion of today trying to convince a colleague that there are times when the data.table package is faster than traditional methods in R. It took a few of the tests below to prove the point. Generate a data.frame of characters and numbers for easy plotting. df <- data.frame(letters = as.character(sample(letters, 1e+08, replace = TRUE)), ...

## A random walk ? What else ?

February 2, 2013
By

Consider the following time series, What does it look like ? I know, this is a stupid game, but I keep using it in my time series courses. It does look like a random walk, doesn’t it ? If we use Philipps-Perron test, yes, it does, > PP.test(x) Phillips-Perron Unit Root Test data: x Dickey-Fuller = -2.2421, Truncation lag parameter = 6,...

## R scripts for analyzing survey data

February 2, 2013
By

Another site pops up with open code for analyzing public survey data: http://www.asdfree.com/ It will be interesting to see whether this gets used by the general public--given the growing trend of data journalism and so forth--versus academics. It is...

## A slightly different introduction to R, part III

February 2, 2013
By

I think you’ve noticed by now that a normal interactive R session is quite messy. If you don’t believe me, try playing around for a while and then give the history() command, which will show you the commands you’ve typed. If you’re anything like me, a lot of them are malformed attempts that generated some

## qdap 0.2.0 released

February 2, 2013
By

This is the first CRAN release of qdap (qdap 0.2.0) found here.  qdap (Quantitative Discourse Analysis Package) is an R package designed to assist in quantitative discourse analysis. The package stands as a bridge between qualitative transcripts of dialogue and … Continue reading →

## RcppExamples 0.1.6

February 1, 2013
By

A pure maintenance release 0.1.6 of RcppExamples was made two weeks ago, and never announced. We merely moved the NEWS.Rd file into the proper location in the inst/ directory, and, while were at it, mentioned the new Rcpp Gallery in the DESCRIPTION fi...

## digest 0.6.2

February 1, 2013
By

digest version 0.6.2 came out a few days ago as an almost immediate follow-up to release 0.6.1. We used paste0() in a few places, and this is only available with newer versions of R. To not introduce as somewhat unnessecary dependency, we reverted thi...

## Bootstrap Confidence Intervals

February 1, 2013
By

Here is an example of nonparametric bootstrapping.  It’s a powerful technique that is similar to the Jackknife. With the bootstrap, however, the approach uses re-sampling. It’s clearly not as good as parametric approaches but it gets the job done. This can be used in a variety of situations ranging from variance estimation to model selection. John

## Visualizing MLB Hall of Fame votes with R

February 1, 2013
By

Carlos Scheidegger and Kenny Shirley created this visualization of votes for the Major League Baseball hall of fame: They describe the chart as follows: The main figure above is a plot of BBWAA Hall of Fame voting by year for all 1,070 players who have appeared on the ballot since Hall of Fame voting began in 1936. The circular...

February 1, 2013
By

So you’ve finally managed to install the pesky environment but have no idea what you are looking at when you open the program. This tutorial is for you. (Again, here is a version with screenshot pictures). When you open R, it might look different than the screenshots in the picture version of the tutorial. This

## Yen and JGBs Short-Term vs Long Term

February 1, 2013
By

I have read some articles arguing that the recent move in the Japanese Yen is overdone.  However, considering the short-term without regard to the long-term context is naïve and potentially dangerous.  Although I do not have significant proo...

## Overdispersion with different exposures

February 1, 2013
By

In actuarial science, and insurance ratemaking, taking into account the exposure can be a nightmare (in datasets, some clients have been here for a few years – we call that exposure – while others have been here for a few months, or weeks). Somehow, simple results because more complicated to compute just because we have to take into account...

## Bayesian model choice for the Poisson model

February 1, 2013
By
$Bayesian model choice for the Poisson model$

Following Arthur Charpentier‘s example, I am going to try to post occasionally on material covered during my courses, in the hope that it might be useful to my students, but also to others. In the second practical of the Bayesian Case Studies course, we looked at Bayesian model choice and basic Monte Carlo methods, looking

## #13 Mapping in R: Representing geospatial data together with ggplot

February 1, 2013
By

I have been trawling around for a while now trying to find a simple and understandable way of representing geospatial data in R, whilst retaining the ability to manipulate the visualisation in ggplot. After much searching I came across some articles which got me to a working product only after a lot of ball ache.

## "I don’t wanna grow up": Age / value relationships for football players

February 1, 2013
By

Let's get back to the age-value relationship from my last post. I did some more plotting to see on which position this inversed U-shaped relationship is strongest. Please note, that I use a dataframe called eu.players throughout this post, which holds downloaded football player information from transfermarkt.de.But first, let us get back to the original graph.

## Converting a dataset from wide to long

February 1, 2013
By

I recently had to convert a dataset that I was working with from a wide format to a long format for my analysis.  I struggled with this a bit, but finally found the right sources and the right package to do it, so I thought I'd share my practical ...