Rcpp 0.8.7

October 15, 2010
By

With the scheduled R release of version 2.12.0 this morning, we have just uploaded version 0.8.7 of Rcpp to CRAN; Debian will follow shortly once the autobuilders have processed R 2.12.0This Rcpp release depends on R 2.12.0 as two things have cha...

Read more »

Nightlights: First Principles

October 15, 2010
By
Nightlights: First Principles

With the publication of Hansen2010 forthcoming it is critical to examine the subject afresh. The global temperature index product from NASA is known as GISSTEMP .GISSTEMP, like the temperature index from Hadley/CRU and NCDC attempts to estimate the average temperature of the globe using historical data archived in the GHCN ( Global Climate Historical Network)

Read more »

The S3 OOP system

October 15, 2010
By

R currently supports two internal OOP systems (S3 and S4), and several others as add-on packages, such as R.oo, and OOP.S3 is easy to use but not reliable enough for large software projects. The emphasis of the S3 system was on generic functions and polymorphism. It’s a function centric system which is different to class centric system like JAVA.

Read more »

The S3 OOP system

October 15, 2010
By

R currently supports two internal OOP systems (S3 and S4), and several others as add-on packages, such as R.oo, and OOP.S3 is easy to use but not reliable enough for large software projects. The emphasis of the S3 system was on generic functions and polymorphism. It’s a function centric system which is different to class centric system like...

Read more »

Nightlights: cool data, bad geocoding

October 14, 2010
By
Nightlights: cool data, bad geocoding

A global source of population density has been on my low-priority wish list for some time, so I was very excited when I found Steve Mosher’s work with the Nighlights data set. “Nightlights” refers to the artificial lights seen from space at night. Astronomers call it “light pollution” which is pretty accurate since it’s decidedly

Read more »

Which parents are satisfied with their child’s education? Those who know their class representative well. Especially in poor schools.

October 14, 2010
By
Which parents are satisfied with their child’s education? Those who know their class representative well. Especially in poor schools.

Another result from our OSI / ESP survey of nearly 11000 parents in ten countries. Dots are individual parents. The y-axis is individual parents’ overall satisfaction with their children’s education. Red dots are parents who know their parent representatives well, blue dots are parents who do not; i.e. colour is mapped to the variable par.rep.1

Read more »

Which parents are satisfied with their child’s education? Those who know their class representative well. Especially in poor schools.

October 14, 2010
By
Which parents are satisfied with their child’s education? Those who know their class representative well. Especially in poor schools.

Another result from our OSI / ESP survey of nearly 11000 parents in ten countries. Dots are individual parents.The y-axis is individual parents' overall satisfaction with their children's education. Red dots are parents who know the...

Read more »

Liquidity Premium vs Liquidity of Corporate Bonds

Liquidity Premium vs Liquidity of Corporate Bonds

All else equal, investors should require higher returns on assets whose liquidity is lower, in other words, investors demand a higher expected return, and hence larger liquidity premium, by holding a less liquidity asset. Risk & return co-exist.Is this really true for corporate bonds? I run a simple regression using R to test my data, where US corporate bonds...

Read more »

How to build a world-beating predictive model using R

How to build a world-beating predictive model using R

Many modern data analysis problems in both industry and academia involve building a model that can predict the future based on historical variables. The 2009 KDD Cup was an international data mining competition devoted to this type of problem, where … Continue reading →

Read more »

Boxplots or raw data graphs?

October 14, 2010
By
Boxplots or raw data graphs?

We recently had a dilemma for an OSI publication about the design for the graphs. There will be dozens of these graphs showing the mean score on a given variable for nearly 11000 parents from 10 countries. This example is for household wealth which has values ranging from 0 to 16. These are the three

Read more »

Boxplots or raw data graphs?

October 14, 2010
By
Boxplots or raw data graphs?

We recently had a dilemma for an OSI publication about the design for the graphs. There will be dozens of these graphs showing the mean score on a given variable for nearly 11000 parents from 10 countries. This example is for household wealt...

Read more »

R is Hot: Part 2

October 14, 2010
By

This is Part 2 of a five-part article series, with new parts published each Thursday. You can download the complete article from the Revolution Analytics website. Critical Mass and Going Viral R was created in 1993 by Ross Ihaka and Robert Gentleman at the University of Aukland in New Zealand. It’s called R for the simple reason that both...

Read more »

Postdoc position in computational Bayesian statistics

October 14, 2010
By
Postdoc position in computational Bayesian statistics

Here is an announcement I received that should interest potential postdocs (willing to come to Paris). The location is on the Orsay campus, south of Paris. In the framework of the ANR-funded Metacoli project which aims at identifying the metabolic underpinnings of the lifestyle diversity in the E. coli species, Genoscope (the genomics institute of

Read more »

Kuwait Airport

October 14, 2010
By
Kuwait Airport

  Kuwait International airport. Giss has it as nightlights =0, so do I. By looking at comparisons of nightlights with the station centered and a static google map with the station centered, there are mismatches between GISS and Me and between Nightlights and the  world. Subtle shift here and there. Annoying. Also, you can see

Read more »

Nightlights, Contours, and Rgooglemap

October 14, 2010
By
Nightlights, Contours, and Rgooglemap

I am continuing the investigation of nightlights using some additional packages from Cran. Here we add Rgooglemaps to the mix. Rgooglemaps is a neat tool that gives you a simple ( needs better docs) interface to the static map server. Perhaps, I’ll modify the code to my likeing, so For now I use it as

Read more »

R wanted for an intern at Barron’s

October 13, 2010
By

R/SQL/scripting, oodles of data, a willing outlet for write-ups. Any takers? Do some good. Intern at Barron’s, the New York financial publication with a decades-long tradition of investigative journalism and a more recent commitment to data analytic exposure of fraud in finance, business and healthcare. Bring us your zeal and your data munging skills and

Read more »

Impact of Google Instant on paid search

October 13, 2010
By
Impact of Google Instant on paid search

When Google introduced Google Instant (where search results are displayed as you type), it was certainly a boon for searchers. Personally, I've started visiting the Google homepage after years of just using the search box in Firefox (and now Chrome), and enjoying the improved search experience. (And I get to see those neat Doodles, too.) But not everyone was...

Read more »

Reassembling logical operations on boolean vectors in Gnu R

October 13, 2010
By

What a headline.. It's about combining boolean vectors in R.

Read more »

Animated plots in R and LaTeX

October 12, 2010
By
Animated plots in R and LaTeX

I like to use animated plots in my talks on functional time series, partly because it is the only way to really see what is going on with changes in the shapes of curves over time, and also because audiences love them! Here is how it is done. For LaTeX, you need to create every

Read more »

Lists of English Words

October 12, 2010
By
Lists of English Words

When I was a kid, I went through an 80s music phase…well, some things never change. “People just love to play with words…” Know that song? Anyway… One of the biggest pains of text mining and NLP is colloquialism — language that is only appropriate in casual language and not in formal speech or writing. Words such as informal contractions...

Read more »

RHIPE in the SD Times

October 12, 2010
By

Saptarshi Guha, who we profiled yesterday, is at the Hadoop World conference in New York City today. At 4PM, Saptarshi will give a presentation on RHIPE, his link between R and Hadoop. Saptarashi was interviewed yesterday by Alex Handy of the SD Times, where he talked about his background and his motivation to create RHIPE. Saptarshi was sponsored by...

Read more »

In case you missed it: September Roundup

October 12, 2010
By

In case you missed them, here are some articles from August of particular interest to R users. We presented a profile of Hadley Wickham, author of many popular R packages including ggplot2 and reshape. We riffed the design of the new Twitter website into a discussion on calculating the Golden Mean with R. Several readers contributed 1-liners based on...

Read more »

Example 8.9: Contrasts

October 12, 2010
By
Example 8.9: Contrasts

In example 8.6 we showed how to change the reference category. This is the natural first thought analysts have when their primary comparisons aren't represented in the default output. But our interest might center on a number of comparisons which don...

Read more »

Export R data to tex code

October 12, 2010
By
Export R data to tex code

We often use Gnu R to work on different things and to solve various exercises. It's always a disgusting job to export e.g. a matrix with probabilities to a LaTeX document to send it to our supervisors, but Rumpel just gave me a little hint.

Read more »

ClusterProfiles

October 12, 2010
By
ClusterProfiles

It is very common to cluster genes based on their expression profiles, and also very common to integrate Gene Ontology to observe the distribution of biological processes, molecular functions and cellular components for a given gene list. But, what if the two in combination? The Gene Ontology distributions across a variety of gene clusters may give us a new...

Read more »

ClusterProfiles

October 12, 2010
By
ClusterProfiles

It is very common to cluster genes based on their expression profiles, and also very common to integrate Gene Ontology to observe the distribution of biological processes, molecular functions and cellular components for a given gene list. But, what if the two in combination? The Gene Ontology distributions across a variety of gene clusters may give us a...

Read more »

Chicago Marathon 2010

October 11, 2010
By

It's the Monday of the Columbus Day weekend here, so I must have been running a Chicago Marathon yesterday. Indeed -- the 34th annual Chicago Marathon took place yesterday but everything was about its 10/10/10 date. The symmetric set of numbers was i...

Read more »

Parallel processing of independent Metropolis-Hastings algorithms

October 11, 2010
By
Parallel processing of independent Metropolis-Hastings algorithms

With Pierre Jacob, my PhD student, and Murray Smith, from National Institute of Water and Atmospheric Research, Wellington, who actually started us on this project at the last and latest Valencia meeting, we have completed a paper on using parallel computing in independent Metropolis-Hastings algorithms. The paper is arXived and the abstract goes as follows:

Read more »

The R-Files: Saptarshi Guha

October 11, 2010
By
The R-Files: Saptarshi Guha

"The R-Files" is an occasional series from Revolution Analytics, where we profile prominent members of the R Community. Name: Saptarshi Guha Background: Ph.D. in Statistics, Purdue University Nationality: India Years Using R: 6 Known for: Developing RHIPE package for R + Hadoop integration At just 31 years old, Saptarshi Guha has emerged as a cutting-edge contributor to the R...

Read more »