## Example 9.21: The birthday "problem" re-examined

February 23, 2012
By

The so-called birthday paradox or birthday problem is simply the counter-intutitive discovery that the probability of (at least) two people in a group sharing a birthday goes up surprisingly fast as the group size increases. If the group is only 23 peo...

## Gini index and Lorenz curve with R

February 23, 2012
By

You can do anything pretty easily with R, for instance, calculate concentration indexes such as the Gini index or display the Lorenz curve (dedicated to my students). Although I did not explain it during my lectures, calculating a Gini index or displaying the Lorenz curve can be done very easily with R. All you have

## Maps with R (III)

February 23, 2012
By

In my previous posts (1 and 2) I wrote about maps with complex legends but without any kind of interactivity. …Continuar leyendo »

## another X’idated question

February 23, 2012
By

An X’idated reader of Monte Carlo Statistical Methods had trouble with our Example 3.13, the very one our academic book reviewer disliked so much as to “diverse a 2 star”. The issue is with computing the integral when f is the Student’s t(5) distribution density. In our book, we compare a few importance sampling solutions,

## Visualization in regression analysis

February 23, 2012
By

Visualization is a key to success in regression analysis. This is one of the (many) reasons I am also suspicious when I read an article with a quantitative (econometric) analysis without any graph. Consider for instance the following dataset, obtained from http://data.worldbank.org/, with, for each country, the GDP per capita (in some common currency) and the infant mortality rate...

## Setting up FastRWeb on Mac OS X

February 23, 2012
By

FastRWeb is an infrastructure that allows any webserver to use R scripts for generating dynamic content, such as web pages and graphics. In this post, you’ll learn how to install and set up FastRWeb on a Mac. This tutorial is expendable to any Unix-like operating system. It is an adaptation from Jay Emerson’s post, Setting

## What to do about publishing data?

February 22, 2012
By

There seems to be a scale that’s tipping at the moment: Data or code that should be published* is often not.  This should seem like an odd statement for anyone in science, but it should be easy to show that most publications … Continue reading →

## Introduction to R and Revolution R Enterprise: Slides

February 22, 2012
By

If you missed this morning's webinar, Revolution R Enterprise, 100% R and More, I've embedded the slides below. Interestingly, about half of today's participants were SAS users, and the remainder R users. The first section introduces open-source R, and the second describes the additional features of Revolution R Enterprise. View more presentations from Revolution Analytics Unfortunately we had a...

## Diamonds vs. water smackdown in playitbyr-powered podcast

February 22, 2012
By

Apparent Reason, my new monthly podcast, is a boisterous and non-technical discussion of economics and statistics. In that format I don't have the luxury of showing charts and graphs to complement my discussion, so I use the playitbyr package to represent the data as sound. (Apparently February is a great month to start R-related podcasts!

## A Sequence Clustering Model in R

February 22, 2012
By
$A Sequence Clustering Model in R$

I’ve just released my first R package! Over the past 1.5 years or so, I’ve been studying an obscure statistical model over ranking (full, or partial) data called Mallows’ model.  It hypothesizes that a set of sequence data has a “modal” sequence about which the data cluster, and that the data fall away from that

February 22, 2012
By

There are different algorithms to calculate the Principal Components (PCs). Kurt Varmuza & Peter Filzmozer explain  them in their book: “Introduction to Multivariate Statistical Analysis in Chemometrics”.I´m going to apply one of them, to...

## The New York Yankees Payroll vs Everyone Else (Major League Baseball)

February 22, 2012
By

Description: Major League Baseball payrolls for all teams since 1985. The New York Yankees payroll is highlighted with results defined by the shape of the point.Data:http://www.baseball-databank.org/ Analysis: For years fans of Major League Baseball (MLB) have been crying...

## Non overlapping labels on a ggplot scatterplot

February 22, 2012
By

This is a very quick post just to share a quick tip on how to add non overlapping labels to a scatterplot in ggplot using a great package called directlabels. The trick is to make each point a single member group using an aesthetic like colour and then apply the direct.label function with the first.qp … Continue reading...

## Log File Analysis with R

February 21, 2012
By

R often comes up in discussions of heavy duty scientific and statistical analysis (and so it should).  However, it is also incredibly handy for a variety of more routine developer activities.   And so I give you… log file analysis with R!  I was just involved in the launch of gradesquare.com (go ahead – click...

## A Heartfelt Thank You and the Resulting GSoC Project

February 21, 2012
By

PerformanceAnalytics has long enjoyed contributions from users who would like to see specific functionality included. Diethelm Wuertz at ETHZ, who is the author and sponsor of all the various R/Metrics packages is one of those contributors. I first met Diethelm when he hosted a conference on high-frequency data in the early 1990′s (where we fretted

## Webinar Wednesday: Introduction to Revolution R Enterprise

February 21, 2012
By

If you haven't yet had a chance to catch my regularly-scheduled webinar, "Revolution R Enterprise - 100% R and More", it's a quick 30-minute introduction to the R language and the added features of Revolution R Enterprise. It's also a chance to ask me any questions you might have about R or Revolution Analytics during the live broadcast (starts...

## Berkeley Earth Surface Temperature: V1.5

February 21, 2012
By

My R package designed to import all of the Berkeley Earth Surface temperature data is officially on CRAN, as BerkeleyEarth.  The version there is 1.3 and I’ve completed some testing with the help of David Vavra. The result of that is version 1.5 which is available here at the drop box. I’ll be posting that

## polar histogram: pretty and useful

February 21, 2012
By

Do you have tens of histograms to show but no room to put them all on the page? As I was reading this paper in Nature Genetics, I came across a simple and clever way of packing all this information … Continue reading →

## Multiple Factor Model – Building Risk Model

February 20, 2012
By

This is the fourth post in the series about Multiple Factor Models. I will build on the code presented in the prior post, Multiple Factor Model – Building CSFB Factors, and I will show how to build a multiple factor risk model. For an example of the multiple factor risk models, please read following references:

## Taking a Ride on the Wild Function – Introducing the dostats package

February 20, 2012
By

Lately I have been rather productive in my programming and frustrated at the same time. Trying to solve the problems of creating a demographics summary table proved to be a lesson in frustration with R. Since I love R, this was disheartening. I did eventually find the reporttools package which does make a great latex

## DNA methylation (RRBS or target capture) analysis with R

February 20, 2012
By

Reduced Representation Bisulfite sequencing (RRBS) is a popular technique for measuring methylation levels across genome. Although it does not have the full genome coverage, it covers many important regions for methylation. Below, I shared a tutorial a...

## how to install and load a package in r

February 20, 2012
By

(This article was first published on twotorials by anthony damico, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on his blog: twotorials by anthony damico. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL,...

## how to work with data tables in r

February 20, 2012
By

(This article was first published on twotorials by anthony damico, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on his blog: twotorials by anthony damico. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL,...

## weird [lack of] control…

February 20, 2012
By

When I ran I was expecting the same output as So this means that the dummy index in R “for” loops cannot be tweaked that easily. I seem to remember doing this kind of (dirty) tricks with earlier versions… Now, Alessandra and Robin think this is a good thing that the for loop is robust

## GUI building in R: gWidgets vs Deducer

February 20, 2012
By

I’ve been a user (and fan) of gWidgets for a couple of years now for GUI building in R. (See my introduction to it here.) However, it’s always good to check out the competition so I’ve been playing around with Deducer to see how they compare. R can access a number of GUI building frameworks

## R_inferno

February 20, 2012
By

Knowing the weakness of your tool is a shortcut toward getting really good at it.

## Big data seminar in London on 1 March 2012

February 20, 2012
By

Removable disk packs in 1975Source: Wikipedia, via Deutsche Fotothek  David Chan from City University is organising an interdisciplinary symposium on tackling the ‘Big Data’ challenge on 1 March 2012.It is an open seminar trying to bring ...

## New R User Group at Berkeley

February 20, 2012
By

There's a new R user group in Berkely, CA: The Berkeley R Language Beginner Study Group. Join this small group for a step-by-step approach to learn the language R. Each session will be filled with examples and participants are welcome to suggest and present topics. If you have just started with R this is the perfect chance to find...

## Simplifying spatial polygons in R

February 20, 2012
By

Polygon simplification is something others have written about, using R packages such as shapefiles. This explores how package ‘rgeos’ uses the Ramer–Douglas–Peucker algorithm, a method commonly used in GIS systems for simplifying shapefiles. It works by imposing a deviation tolerance … Continue reading →