## Visualizing Principal Components

December 22, 2012
By

Principal Component Analysis (PCA) is a procedure that converts observations into linearly uncorrelated variables called principal components (Wikipedia). The PCA is a useful descriptive tool to examine your data. Today I will show how to find and visualize Principal Components. Let’s look at the components of the Dow Jones Industrial Average index over 2012. First,

## Basics of Histograms

December 22, 2012
By

Histograms are used very often in public health to show the distributions of your independent and dependent variables.  Although the basic command for histograms (hist()) in R is simple, getting your histogram to look exactly like you want takes g...

## Get the party started

December 22, 2012
By

Have you already used trees or random forests to model a relationship of a response and some covariates? Then you might like the condtional trees, which are implemented in the party package.In difference to the CART (Classification and Regression ...

## The definitive guide to plotting confidence intervals in R

December 22, 2012
By

Here at is.R(), we have produced countless posts that feature plots with confidence intervals, but apparently none of those are easy to find with Google. So, today, for the purposes of SEO, we’ve put “plotting confidence intervals” in the title of our post. We also cannot resist an earnest plea from our...

## Chocolate and nobel prize – a true story?

December 22, 2012
By

Few of us can resist chocolate, but the real question is: should we even try to resist it? The image is CC by Tasumi1968. As a dark chocolate addict I was relieved to see Messerli's ecological study on chocolate consumption and the...

December 21, 2012
By

Coursera is offering free courses about R among other interesting subjects. The first one on the application of R in financial econometrics is happening this week (but you can still enroll). There are two more courses starting in January 2013 are more about using R to analyse the data. The differences between the two are

## Simple data simulator for the 2PL model

December 21, 2012
By

The function: This is a very simple data simulator for a 2PL Model. This is just to get you started, from here is easy to add function parameters for indicating item locations and slopes or person distribution characteristics. The function accepts on...

## Rcpp 0.10.2

December 21, 2012
By

Relase 0.10.2 of Rcpp provides the second update to the 0.10.* series, and has arrived on CRAN and in Debian. It brings another great set of enhancements and extensions, building on the recent 0.10.0 and 0.10.1 releases. The new Rcpp attributes were rewritten to not require Rcpp modules (as we encountered on issue with exceptions on Windows when built this...

## R for inquisition

December 21, 2012
By

A post on high-dimensional arrays by @isomorphisms reminded me of APL and, more generally, of matrix languages, which took me back to inquisitive computing: computing not in the sense of software engineering, or databases, or formats, but of learning by poking problems through a computer. I like languages not because I can get a job

## Create optical illusions with R

December 21, 2012
By

I love optical illusions (like this and this and these), not just because they're fun, but also beca...

## A simple web application using Rook

December 21, 2012
By

by Ben Ogorek I'm grateful to Rook for helping me, a simple statistician, learn a few fundamentals of web technology. For R web application development, there are increasingly polished methods available (most notably Shiny ), but you can build one...

## Generating a non-homogeneous Poisson process

December 21, 2012
By
$(N_t)_{t\geq 0}$

Consider a Poisson process , with non-homogeneous intensity . Here, we consider a deterministic function, not a stochastic intensity. Define the cumulated intensity in the sense that the number of events that occurred between time  and  is a random variable that is Poisson distributed with parameter  . For example, consider here a cyclical Poisson process, with intensity lambda=function(x) 100*(sin(x*pi)+1) To compute...

## Computing an empirical pFDR in R

December 21, 2012
By

The positive false discovery rate (pFDR) has become a classical procedure to test for false positive. It is one of my favourite because it rely on a re-sampling approach.I base my implementation on John Storey PNAS paper and the technical report he published with Rob Tibshirani while at Stanford (I find the technical report...

## Working with geographical Data. Part 1: Simple National Infomaps

December 21, 2012
By

There is a popular expression in my country called “Gastar polvora en chimangos”, whose translation in English would be “spending gunpowder in chimangos”. Chimango is a kind of bird whose meat is useless for humans. So “spending gunpowder in chimangos” … Continue reading →

## Beautiful network diagrams with ggplot2

December 21, 2012
By

I don’t usually like describing my own work as “beautiful,” but with your permission I will make an exception today. There have been some requests for scripts illustrating the plotting of network diagrams with ggplot2, and today (for the winter solstice) we’re bringing you a really nice-looking way of doing just that. In fact, this Gist...

## Y2K38: Our Own Mayan Calendar…Again

December 21, 2012
By
$Y2K38: Our Own Mayan Calendar…Again$

It’s not quite the end of the world as we know it.  We made it through December 21, 2012 unscathed. It’s not going to be the last time we will make it through such a pseudo-calamity.  After all we have built our own end of the world before (e.g. Y2K). Next up January 19, 2038.

December 21, 2012
By

The Italian BioR Day took place on November 30 and almost sixty R enthusiastic joining the event! Thanks to all the participants and a special thanks to the speakers who shared their knowledge with us and to the Parco Tecnologico … Continue reading →

## Simple data simulator for the Rasch Model

December 21, 2012
By

The function: This is a very simple data simulator for the Rasch Model.This is just to get you started, from here is easy to add function parameters for indicating item locations or person distribution characteristics. The function accepts only two p...

## Removing Records by Duplicate Values in R – An Efficiency Comparison

December 20, 2012
By

After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of order() and duplicated() or sqldf(). Afterward, I did a simple efficiency comparison between two methods as below. The comparison result is pretty self-explanatory. In terms

## Querying, parsimony and golden hammers

December 20, 2012
By

I love it when things are easy. I love it so much that I’ll spend a great deal of time and effort to keep things simple. At the same time, though, I think there’s some value in expending effort in pursuit of something. If you want to understand a thing, you have to spend time

## Visualising Tourism Data using R with googleVis package

December 20, 2012
By

Inspired by Mages’s post on Accessing and plotting World bank data with R (using googleVis package), I created one visualising tourism receipts and international tourist  arrivals of various countries since 1995. The data used are from the World Bank’s country indicators. To see the motion chart, double click a picture below.  Code Filed under: R, Tourism

## moRe

December 20, 2012
By

Hopefully my first R post whetted your apatite for open source data software.  I’m gearing up for more R posts regardless.  I thought I’d do a quick post about a couple of useful commands, ‘View’ and ‘fix’. When you first break the shackles of Excel one of the toughest things is not being able to

## R Journal Volume 4, Issue 2

December 20, 2012
By

The latest issue of the bi-annual, peer-reviewed journal about R, the R Journal, is now available for download. This issue includes three articles on graphics from R-core member and R Graphics author Paul Murrell. He writes about accessing individual elements of an R chart by the component names, drawing complex symbols with the polypath function (useful for map icons,...

## Stealing from the internet: Part 1

December 20, 2012
By

Well, not stealing but rather some handy tools for data mining… About a year ago I came across the package XML as I was struggling to get some data from various web pages. The purpose of this blog is to describe how this package can be used to quickly gather data from the internet. I’ll

## Shiny/R Conversion of Another One of My Favorite Mike Bostock d3 Examples

December 20, 2012
By

Mike Bostock has revolutionized visualization with his d3 and his seemingly infinite examples.  In another adaptation of his amazing work, I will adapt one of my favorite examples to supplement the interactive scatterplot with data supplied by R t...

## Shiny SVG no d3–New and Improved

December 20, 2012
By

The fine author Joe Cheng of RStudio Shiny suggested in this Google Groups message to use htmlOutput rather than the ugly hack in my last post R Shiny svg with no d3.  As I should have known, it works great and eliminates all the useless javascrip...

## Turnovers are poison

December 20, 2012
By

This is probably a slightly useless post, but a bit of fun all the same. If nothing else, it allows me to take a stab at learning a bit more about logistic regression. I’m still trying to unravel the mystery of why the Bears lost to the Vikings two weeks ago. This mystery is compounded

## Generation of E-Learning Exams in R for Moodle, OLAT, etc.

December 20, 2012
By

(Guest post by Achim Zeileis) Development of the R package exams for automatic generation of (statistical) exams in R started in 2006 and version 1 was published in JSS by Gr?n and Zeileis (2009). It was based on standalone Sweave exercises, that can be combined …Read more »

## Influence.ME: Tools for Detecting Influential Data in Multilevel Regression Models

December 20, 2012
By

Despite the increasing popularity of multilevel regression models, the development of diagnostic tools lagged behind. Typically, in the social sciences multilevel regression models are used to account for the nesting structure of the data, such as students in classes, migrants ...