How to: Debug in R

June 23, 2010
By

Revolution Analytics is proud to sponsor the New York R User Group. The last meeting was on the theme of debugging in R, and some videos of the talks are now available at the Video Rchive. Jay Emerson have a talk on Basic debugging in R and Harlan Harris dived deeper on advanced debugging techniques. Also presenting were Peter...

Read more »

Scoping Bugs

June 22, 2010
By

I ran a across a strange bug in R recently. Like all the best programming languages, R treats functions as first class objects. That is to say that functions can be passed as arguments and return values from functions, named as variables, and, while not part of the strict definition of first class...

Read more »

Linear Modeling in R and the Hubble Bubble

June 22, 2010
By
Linear Modeling in R and the Hubble Bubble

Here is a scatter plot with the coordinate labels deliberately omitted. Figure 1. Do you see any trends? How would you model these data? It just so happens that this scatterplot is arguably the most famous scatterplot in history. One aficionado, writing more than forty years after its publication, commented skeptically :" data points were consequently spread...

Read more »

Linear Modeling in R and the Hubble Bubble

June 22, 2010
By
Linear Modeling in R and the Hubble Bubble

Here is a scatter plot with the coordinate labels deliberately omitted. Figure 1. Do you see any trends? How would you model these data? It just so happens that this scatterplot is arguably the most famous scatterplot in history. One aficionado, writing more than forty years after its publication, commented skeptically :" data points were consequently spread...

Read more »

Reaching escape velocity

June 22, 2010
By
Reaching escape velocity

Sample once from the Uniform(0,1) distribution. Call the resulting value . Multiply this result by some constant . Repeat the process, this time sampling from Uniform(0, ). What happens when the multiplier is 2? How big does the multiplier have to be to force divergence. Try it and see: iters = 200 locations = rep(0,iters)

Read more »

Analyzing competitive nordic skiing with R

June 22, 2010
By
Analyzing competitive nordic skiing with R

Here's another great example of R being used to analyze sports data. Statistician and skier Joran Elias has started a project to analyze and visualize international cross country ski racing results, and he publishes his analysis at the blog Statistical Skier. All of the analyses are done using R (and for data, SQLite via the RSQLite package). As much...

Read more »

Employee productivity as function of number of workers revisited

June 22, 2010
By
Employee productivity as function of number of workers revisited

We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is mildly scary. We revisit the analysis for the...

Read more »

Employee productivity as function of number of workers revisited

June 22, 2010
By
Employee productivity as function of number of workers revisited

We have a mild obsession with employee productivity and how that declines as companies get bigger. We have previously found that when you treble the number of workers, you halve their individual productivity which is mildly scary. We revisit the analysis for the...

Read more »

The most violent municipalities in Mexico (2008)

June 21, 2010
By
The most violent municipalities in Mexico (2008)

The top six most violent municipalities are near the US border. Ciudad Juárez is in a class by itself with 113 homicides per 100,000 people. José Azueta is the municipality where Zihuatanejo is located. Mazátlan, another popular tourist destination, also appears on the list.  Lázaro Cárdenas is the largest seaport in Mexico and ever since the...

Read more »

The most violent municipalities in Mexico (2008)

June 21, 2010
By
The most violent municipalities in Mexico (2008)

The top six most violent municipalities are near the US border. Ciudad Juárez is in a class by itself with 113 homicides per 100,000 people. José Azueta is the municipality where Zihuatanejo is located. Mazátlan, another popular tourist destination, also appears on the list.  Lázaro Cárdenas is the largest seaport in Mexico and ever since the...

Read more »

R Layout command.

June 21, 2010
By
R Layout command.

In the previous post I created a chart but could not figure out to fit the legend in the chart area. Peter Carl pointed me to the layout command which partitions the display area and allowed the the legend to be included. Source code to produce the c...

Read more »

MMDS 2010

June 21, 2010
By

The 2010 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2010) finished up this past Friday (June 18th) at Stanford. This was an exceptionally well organized conference: four days of mind-stretching talks on algorithm development and the challenges of working with massive data sets approached from almost every conceivable angle. The approximately 100 attendees were a diverse group...

Read more »

New blog from Rmetrics Foundation

June 21, 2010
By

The Rmetrics Foundation (the sharp minds behind the Rmetrics suite of packages for financial analysis in R) have just launched a new blog where you can keep up with the latest Rmetrics news. Amongst the recent news: a ne eBook about data management of Indian financial market data, and a new interface between Rmetrics and AMPL. You can also...

Read more »

Example 7.42: Testing the proportionality assumption

June 21, 2010
By
Example 7.42: Testing the proportionality assumption

In addition to the non-parametric tools discussed in recent entries, it's common to use proportional hazards regression, (section 4.3.1) also called Cox regression, in evaluating survival data.It's important in such models to test the proportionality a...

Read more »

A quetion

June 21, 2010
By
A quetion

I posted a question to the R-Mixed-Model mailing list but have not had any responses yet:--------------------------------------------------------------------------------------- Dear All,I wonder if anybody has tried to make glmmBUGS work with JAGS. Mya...

Read more »

The police records for 2009 are out.

June 20, 2010
By
The police records for 2009 are out.

The 2009 homicide numbers collected by the SNSP (National System of Public Security) are finally out, you can download the data from the ICESI, which is a civic institution not affiliated with the government. If you remember, one of the conclusio...

Read more »

The police records for 2009 are out.

June 20, 2010
By
The police records for 2009 are out.

The 2009 homicide numbers collected by the SNSP (National System of Public Security) are finally out, you can download the data from the ICESI, which is a civic institution not affiliated with the government. If you remember, one of the conclusio...

Read more »

Here’s the distribution of the first million digits of the…

June 20, 2010
By
Here’s the distribution of the first million digits of the…

Here’s the distribution of the first million digits of the square root of two’s decimal expansion. Number of digits | is:   0's |  99 818  1's |  98 926  2's | 100 442  3's | 100 191  4's | 100 031  5's | 100 059  6's |  99 885  ...

Read more »

QSPR modeling with signatures

June 20, 2010
By
QSPR modeling with signatures

I had to dig deep to find posts on QSAR modeling. There are quite a few on QSAR in Bioclipse, but that focuses on the descriptor calculation. In a quick scan, I could only spot two modeling posts:The CDK/Metabolomics/Chemometrics Unconference resultsWh...

Read more »

R-INLA package

June 19, 2010
By
R-INLA package

Another R package for mixed effect modeling. Looks promising.

Read more »

Estimating Probability of Drawdown

June 19, 2010
By
Estimating Probability of Drawdown

I've shown several examples of how to use LSPM's probDrawdown function as a constraint when optimizing a leverage space portfolio.  Those posts implicitly assume the probDrawdown function produces an accurate estimate of actual drawdo...

Read more »

More powerful iconv in R

June 19, 2010
By

The R function iconv converts between character string encodings, for example, from the locale dependent encoding to UTF-8: > iconv("foo", to="UTF-8") [1] "foo" However, R has long-running trouble with embedded null characters ('') in strings. Hence, if we try to convert to an encoding that permits embedded null characters, iconv will fail: > iconv("foo", to="UTF-16")

Read more »

What I need to know…

June 19, 2010
By

is maps and geographical data representation in R. In case you’re curious too this is a good study material from R-Bloggers : maps ; geographical ; spatial Ok. This could be a tweet rather than a post…

Read more »

ggplot2 GUI progress

June 19, 2010
By
ggplot2 GUI progress

(Written by Ian Fellows) Below is a link to the first of a weekly (or bi-weekly) screen-cast vlog of my progress building a GUI for the ggplot2 package. http://neolab.stat.ucla.edu/cranstats/gsoc_vlog1.mov comments and suggestions are more than welcome, and can e-mailed to me at: [email protected]

Read more »

The perfect fake

June 19, 2010
By
The perfect fake

Usually when you are doing Monte Carlo testing, you want fake data that’s good, but not too good. You may want a sample taken from the Uniform distribution, but you don’t want your values to be uniformly distributed. In other words, if you were to order your sample values from lowest to highest, you don’t

Read more »

Why R doesn’t suck

June 19, 2010
By

I first encountered the R programming language a few years ago when I needed to make some plots. Although I’ve used it occasionally since, I always considered it a sort of “Perl for statisticians” — a useful swiss-army knife with … Continue reading →

Read more »

Those dice aren’t loaded, they’re just strange

June 18, 2010
By
Those dice aren’t loaded, they’re just strange

I must confess to feeling an almost obsessive fascination with intransitive games, dice, and other artifacts. The most famous intransitive game is rock, scissors, paper. Rock beats scissors.  Scissors beats paper. Paper beats rock. Everyone older than 7 seems to know this, but very few people are aware that dice can exhibit this same behavior,

Read more »

Revolution Analytics: Startup to watch

June 18, 2010
By

Jack Germain of LinuxInsider interviewed Revolution CEO Norman Nie for his "Startup to Watch" column. Amongst the topics covered: the R language (Norman: "There are no statistical expressions that can not be written in R"), Revolution's recent name-change and announcement of our development roadmap, and the challenges of competing with SAS and Norman's former company, SPSS. Read the full...

Read more »

The impact of the drug war in Mexico

June 18, 2010
By
The impact of the drug war in Mexico

For the last couple of years, Mexico has been in the midst of an escalating drug war, with violent crime on the upswing in many areas. But tracking the impact quantitatively is difficult: in Mexico, about 85% of crimes go unreported, and corruption leads to inaccurate reporting in some districts. Diego Valle has taken on the task of visualizing...

Read more »