Using Regular Expressions in R: Case Study in Cleaning a BibTeX Database

March 9, 2010
By
Using Regular Expressions in R: Case Study in Cleaning a BibTeX Database

I recently had to clean up a BibTeX database containing around 1,000 references. One of the clean up tasks was to ensure that page numbers were separated with en-dashes as opposed to hyphens. This post sets out how I used regular expressions in R to co...

Read more »

Rcpp 0.7.8

March 9, 2010
By

Version 0.7.8 of the Rcpp R / C++ interface classes is now on CRAN and in Debian. As of right now. Debian has already built packages for eight more architectures; and CRAN has built the Windows binary. Oh, and cran2deb had Debian packages for 'testing'...

Read more »

Rcpp 0.7.8

March 9, 2010
By

Version 0.7.8 of the Rcpp R / C++ interface classes is now on CRAN and in Debian. As of right now. Debian has already built packages for eight more architectures; and CRAN has built the Windows binary. Oh, and cran2deb had Debian packages for 'testin...

Read more »

principal components and image reconstruction

March 9, 2010
By
principal components and image reconstruction

Jeff Lewis at UCLA told me he teaches principal components with an image reconstruction example. This got me inspired to try it myself. A snapshot appears below, showing how the image quality improves quickly with a relatively small number of principal components. A full, Sweaved write up is here, making use of the biOps package

Read more »

Learning R by video

March 9, 2010
By
Learning R by video

For those people who prefer to be shown how to do something rather than read the instructions, there are some videos on using R available online. Here are the ones I know about. Please add links to other similar resources in the comments. R videos Learn R Toolkit What is R? from Revolution Analytics R

Read more »

Introducing R on video

March 9, 2010
By
Introducing R on video

Darren Wraith pointed out to me this site proposing a whole series of videos introducing to R. (Unfortunately in a Windows environment.) This can be handy when facing students with no R background… Filed under: R, Statistics, University life Tagged: course, video

Read more »

Getting the basics from readAligned

March 9, 2010
By

The UCR guide is a little sparse with regard to getting basic information from readAligned.I'd like to add to the general cookbook. If some bioc people out there can contribute some alignment recipes can fill me in on some more basics please comment:alignedReads #how many reads did I attempt to align#i don't think you can't get this from...

Read more »

Getting the basics from readAligned

March 9, 2010
By

The UCR guide is a little sparse with regard to getting basic information from readAligned.I'd like to add to the general cookbook. If some bioc people out there can contribute some alignment recipes can fill me in on some more basics please comment:alignedReads #how many reads did I attempt to align#i don't think you can't get this from...

Read more »

Cluster analysis of what the world eats

March 9, 2010
By
Cluster analysis of what the world eats

Keeping with the theme of the post below, I used a clustering algorithm to group the different countries according to what they eat. I simply played around with the number of clusters until I got something I thought resembled reality, so don't interpre...

Read more »

Cluster analysis of what the world eats

March 9, 2010
By
Cluster analysis of what the world eats

Keeping with the theme of the post below, I used a clustering algorithm to group the different countries according to what they eat. I simply played around with the number of clusters until I got something I thought resembled reality, so don't interpre...

Read more »

Open Source is Opening Data to Predictive Analytics

March 9, 2010
By

This article by REvolution Computing CEO Norman Nie is crossposted from the Future of Open Source Forum. The R Project: despite there being over 2 million users of this open-source language for statistical data analysis, you might not have heard of it ... yet. You might have seen this feature in the New York Times last year, and you...

Read more »

Chinese versus Japanese editions

March 8, 2010
By
Chinese versus Japanese editions

Last week, I got news from Springer Verlag about possibly two new editions of my books, one in Chinese and one in Japanese. These were bad news and good news: the bad news was that the Chinese edition was actually a reprint of our original book,  Monte Carlo Statistical Method, by a Chinese publishing company.

Read more »

White House taps Edward Tufte to explain the stimulus

March 8, 2010
By
White House taps Edward Tufte to explain the stimulus

Edward Tufte, a pioneer of effective data visualization (and a personal hero) has just been appointed by the White House to the Recovery Independent Advisory Panel. This panel advises The Recovery Accountability and Transparency Board, whose job is to track and explain $787 billion in recovery stimulus funds. Tufte explains: I'm doing this because I like accountability and transparency,...

Read more »

Weird dietary habits in the US

March 8, 2010
By
Weird dietary habits in the US

Using this database of food consumption data the blog Canibais e Reis kindly put together, I calculated all values for which the US was at least 2 standard deviations from the world average. Here are the outliers in standard deviations from the w...

Read more »

Weird dietary habits in the US

March 8, 2010
By
Weird dietary habits in the US

Using this database of food consumption data the blog Canibais e Reis kindly put together, I calculated all values for which the US was at least 2 standard deviations from the world average. Here are the outliers in standard deviations from the w...

Read more »

Chilean earthquake: impact of the tsunami

March 8, 2010
By
Chilean earthquake: impact of the tsunami

The National Oceanic and Atmospheric Administration (NOAA) has a page with some interesting information about last week's earthquake in Chile, but what really stood out for me was this chart of the predicted wave heights around the globe resulting from the associated tsunami: Click to enlarge: it's a fascinating chart. Although labelled a forecast, from the explanations on the...

Read more »

Example 7.26: probability question

March 8, 2010
By
Example 7.26: probability question

Here's a surprising problem, from the xkcd blog.Suppose I choose two (different) real numbers, by any process I choose. Then I select one at random (p= .5) to show Nick. Nick must guess whether the other is smaller or larger. Being right 50% of the ...

Read more »

R: Eliminating observed values with zero variance

March 8, 2010
By
R: Eliminating observed values with zero variance

I needed a fast way of eliminating observed values with zero variance from large data sets using the R statistical computing and analysis platform. In other words, I want to find the columns in a data frame that has zero variance. And as fast as possible, because my data sets are large, many, and changing fast....

Read more »

R: Eliminating observed values with zero variance

March 8, 2010
By
R: Eliminating observed values with zero variance

I needed a fast way of eliminating observed values with zero variance from large data sets using the R statistical computing and analysis platform. In other words, I want to find the columns in a data frame that has zero variance. And as fast as possible, because my data sets are large, many, and changing fast....

Read more »

InfoChimps

March 7, 2010
By
InfoChimps

This looks interesting: http://infochimps.org/search?query=soil

Read more »

ggplot and concepts — what’s right, and what’s wrong

March 7, 2010
By
ggplot and concepts — what’s right, and what’s wrong

A few months back I gave a presentation to the NYC R Meetup. (R is a statistical programming language. If this means nothing to you, feel free to stop reading now.) The presentation was on ggplot2, a popular package for generating graphs of data and statistics. In the talk (which you can see here, including

Read more »

A nice link: “Some hints for the R beginner”

March 7, 2010
By

Patrick Burns just posted to the mailing list the following massage: There is now a document called “Some hints for the R beginner” whose purpose is to get people up and running with R as quickly as possible. Direct access to it is: http://www.burns-stat.com/pages/Tutor/hints_R_begin.html JRR Tolkien wrote a story (sans hobbits) called ‘Leaf by Niggle’ that has always resonated with me. I...

Read more »

One R Tip A Day meets Tecnica Arcana

March 7, 2010
By
One R Tip A Day meets Tecnica Arcana

For italian speaking people only (sorry!). Carlo il curatore dell'ottimo podcast tecnologico Tecnica Arcana mi ha intervistato sulla mia professione e su R. Qui potete scaricare l'intervista in formato mp3.

Read more »

Ecological Modelling with “R”

March 7, 2010
By

Here i present some Books and Articles about Ecological Modelling and “R”. Since “R” is integrated in Bio7 all the presented methods in the Books and Articles can also be useful together with Bio7. Books: Ellner, Stephen P. & Guckenheimer, John (2006). Dynamic Models in Biology. Princeton University Press Bolker B (2008) Ecological Models and

Read more »

Intermarket Whac-A-Mole

March 6, 2010
By
Intermarket Whac-A-Mole

Every trader that looks at more than one market throughout the day will recognize that there is a certain symmetrical relationship between certain markets at certain times. The confounding thing about these intermarket relationships is that they are fl...

Read more »

schoolmath

March 6, 2010
By
schoolmath

In connection with the Le Monde puzzle of last week, I was looking for an R function that would give me the prime factor decomposition of any integer. Such a function exists within the package schoolmath, developped by Joerg Schlarmann and Josef Wienand. It is called prime.factor and it returns the prime factors of any

Read more »

Visualizing Drought

March 6, 2010
By
Visualizing Drought

The impacts of drought depend on time-scale. On short time-scales, drought means dry soil. On long time-scales, it means dry rivers and empty reservoirs. A region may simultaneously experience dry conditions on one time-scale and wet conditions on another e.g. wet soil but low streamflow or visa versa. Standardized Precipitation Index (SPI) is a widely

Read more »

Contingency Tables – Fisher’s Exact Test

March 6, 2010
By

A contingency table is used in statistics to provide a tabular summary of categorical data and the cells in the table are the number of occassions that a particular combination of variables occur together in a set of data. The relationship between variables in a contingency table are often investigated using Chi-squared tests. The simplest contingency

Read more »

Posterior likelihood

March 6, 2010
By
Posterior likelihood

At the Edinburgh mixture estimation workshop, Murray Aitkin presented his proposal to compare models via the posterior distribution of the likelihood ratio. As already commented in a post last July, the positive aspect of looking at this quantity rather than at the Bayes factor is that the priors are then allowed to be improper if

Read more »