Sachin Tendulkar’s longevity

June 15, 2011
By
Sachin Tendulkar’s longevity

There have been over 3300 cricketers who've played Test and One Day cricket. The youngest player was 14 years old Hasan Raza from Pakistan who played 5 ODIs and 2 Test matches at that age. The oldest player was 52 years old&nbs...

Read more »

sas7bdat database reader update

June 14, 2011
By

An earlier post (1216) introduced a compatibility study (i.e. reverse engineering) of the sas7bdat database file format. The code and documentation for this are here: http://github.com/biostatmatt/sas7bdat. I've recently restructured the code as an R package, and added some functionality. Look for the sas7bdat package on the CRAN. Also, the read.sas7bdat code has been ported to

Read more »

Embedding a time series with time delay in R — Part II

June 14, 2011
By
Embedding a time series with time delay in R — Part II

Some months ago, I posted a function that extended the base R function embed() to allow for time delay embedding. Today, David Gonzales alerted me to an inconsistency between embed() and Embed(). The example David used was where Embed() clearly … Continue reading →

Read more »

Importing Nanotoxicity Data with SPARQL into R for analysis

June 14, 2011
By

Not so long ago I wrote about mporting RDF input in R for analysis. I am collecting nanotoxicology data in a Semantic MediaWiki with the RDFIO extension installed (by Samuel), allowing me to SPARQL that data directly from R. There is nothing much structural to visualize at this moment, so I'm skipping the Bioclipse...

Read more »

Hot Job in IT: Data Science

June 14, 2011
By

CIO Magazine today has an article on the "6 Hottest New Jobs in IT" in which features Data Science and R at #2: "There's now an intellectual consensus in business that the only way to run an enterprise is to use analytics with data scientists to find opportunities," says Norman Nie, CEO of Revolution Analytics, which produces the first...

Read more »

Wilcoxon Champagne test

June 14, 2011
By
Wilcoxon Champagne test

As an appetizer for Paris triathlon, Jérôme and I ran as a team last week-end an adventure racing in Champagne region (it mainly consists in running, cycling, canoeing, with a flavor of orienteering, and Champagne is kept for the end). It was organized by Ecole Polytechnique students who, for the first time, divided Saturday’s legs

Read more »

REIT Momentum in Quantstrat

June 14, 2011
By
REIT Momentum in Quantstrat

I took a short break from quantstrat to do some REIT analysis REITs for Everybody Might Now Mean REITs for Nobody.  Now let’s link the two by incorporating The Aleph Blog momentum bucket strategy in quantstrat. From TimelyPortfolio In ...

Read more »

RStudio Beta 3 (v0.94)

June 14, 2011
By
RStudio Beta 3 (v0.94)

RStudio Beta 3 (v0.94) is available for download today. The goal for this release was to refine and improve our core features based on the feedback we’ve gotten on our first two betas. Highlights of the new release include: Source editor enhancements — New editor features include brace/paren/quote matching, more intelligent cursor placement after newlines, function

Read more »

Boxplots without boxes

June 14, 2011
By

Let’s say you have several categories with multiple data points each that you would like to plot as individual points. Even if you have only a single point, the R graphics package will plot a line (without a box for … Continue reading →

Read more »

No lake is an island: PhD Opportunity

June 14, 2011
By
No lake is an island: PhD Opportunity

NERC recently funded the formation of the UK Lake Ecological Observation Network (UKLEON) as part of the its Networks of Sensors programme. UKLEON is lead by Ian Jones at CEH Lancaster. A fully-funded PhD project is associated with the UKLEON … Continue reading →

Read more »

Multiple Comparisons for GLMMs using glmer() & glht()

June 14, 2011
By
Multiple Comparisons for GLMMs using glmer() & glht()

...that's an example of how to apply multiple comparisons to a generalised linear mixed model using the function glmer from package lme4 & glht() from package multcomp. By the way you see a nice example for visualizing data from a nested sampli...

Read more »

Embedding a time series with time delay in R — Part II

June 14, 2011
By

Some months ago, I posted a function that extended the base R function embed() to allow for time delay embedding. Today, David Gonzales alerted me to an inconsistency between embed() and Embed().

Read more »

Dependence and Correlation

June 13, 2011
By
Dependence and Correlation

In everyday life I hear the word "correlation" thrown around far more often than "dependence." What's the difference? Correlation, in its most common form, is a measure of linear dependence; the catch is that not all dependencies are linear. The set...

Read more »

Asians Love Their Bandwidth

June 13, 2011
By
Asians Love Their Bandwidth

I recently ran across some data from comScore networks. The data contains results from surveys from a number of households across the US with demographic information (ethnicity, household income, household size, state, zipcode, etc.) along with whether...

Read more »

Sweave source for poll report

June 13, 2011
By

Sweave source for the poll report for those who expressed some interest. You’ll also need this file of R function definitions, utilities.R. I also wrote a little shell script that calls Sweave and xelatex etc, hacking the Sweave.sh script that ships with R.

Read more »

Donor analysis in R – Smith for Congress

June 13, 2011
By
Donor analysis in R – Smith for Congress

In a previous post I introduced the Smith for Congress data set. The data is 49k contributions made by individuals to a congressional campaign for the 2006-2010 electoral cycles. Smith for Congress is not the name of the actual campaign. Individual contributions are not required to be disclosed by a campaign unless the individual donates

Read more »

Example 8.40: Side-by-side histograms

June 13, 2011
By
Example 8.40: Side-by-side histograms

It's often useful to compare histograms for some key variable, stratified by levels of some other variable. There are several ways to display something like this. The simplest may be to plot the two histograms in separate panels.SASIn SAS, the most d...

Read more »

Using simulation to demonstrate theory: Hardy-Weinberg Equilibrium

June 13, 2011
By
Using simulation to demonstrate theory: Hardy-Weinberg Equilibrium

One of my teaching roles is in an introductory Genetics course, where first year students are presented with a wide range of new ideas at a relatively fast pace.  It seems that often, students choose to take a memorization approach to learning the material, rather than taking the chance to think about how and why

Read more »

R package DOSE released

June 12, 2011
By

Disease Ontology (DO) provides an open source ontology for the integration of biomedical data that is associated with human disease. DO analysis can lead to interesting discoveries that deserve further clinical investigation.DOSE was designed for semantic similarity measure and enrichment analysis.Read More: 619 Words Totally

Read more »

REITs for Everybody Might Now Mean REITs for Nobody

June 12, 2011
By
REITs for Everybody Might Now Mean REITs for Nobody

THIS IS MY OPINION AND ANALYSIS AND IS NOT INVESTMENT ADVICE.  YOU ARE RESPONSIBLE FOR YOUR OWN GAINS AND LOSSES. I think REITs traditionally attract conservative dividend investors (grandparents), but due to their recent behavior, REITs also attr...

Read more »

Listing of Statistics and Machine Learning Conferences

June 12, 2011
By

Occasionally, I will query Google with “statistics conferences”, “machine learning conferences” or “pattern recognition conferences” and the like. But often, it is difficult to obtain anything meaningful other than the conferences of which ...

Read more »

Two Castles Run 2011

June 12, 2011
By
Two Castles Run 2011

I did the Two Castles Run today; it’s a 10km race between Warwick and Kenilworth castles. The organizers were very quick to put the results online and even went the extra mile of offering them as a CSV file. It … Continue reading →

Read more »

Additive modelling and the HadCRUT3v global mean temperature series

June 12, 2011
By
Additive modelling and the HadCRUT3v global mean temperature series

Earlier, I looked at the HadCRUT3vgl data set using generalized least squares to investigate whether the trend in temperature since 1995 was statistically significant. Here I want to follow-up one of the points from the earlier posting; namely using a … Continue reading →

Read more »

Can You Beat the Market with Modern Portfolio Theory? (Part 2)

June 12, 2011
By
Can You Beat the Market with Modern Portfolio Theory? (Part 2)

(Obligatory Warning: This post should not be considered investment advice. The author(s) of this blog are not certified financial analysts. Any analysis presented here is meant only as an opinion. Following our opinion could end up losing you a lot of ...

Read more »

Animated Plots with R

June 12, 2011
By
Animated Plots with R

Using ImageMagick it's pretty easy to make an animated gif from a set of plots. Essentially the way to do it is to save a plot for each frame of the animation and then convert them into a .gif. Here's a simple example that plots binomial density's for ...

Read more »

On Crows

June 12, 2011
By
On Crows

Today I made the mistake of clicking on the "Next Blog" button, which took me to a rather inane post complaining that crows are (obviously) stupid (because they are sometimes hit by cars). I was reminded that crows are actually quite smart.

Read more »

Additive modelling and the HadCRUT3v global mean temperature series

June 12, 2011
By
Additive modelling and the HadCRUT3v global mean temperature series

Earlier, I looked at the HadCRUT3vgl data set using generalized least squares to investigate whether the trend in temperature since 1995 was statistically significant. Here I want to follow-up one of the points from the earlier posting; namely using a statistical technique that fits a local, and not global, model to the...

Read more »

A Little R Counter

June 11, 2011
By
A Little R Counter

I recently read a great post about environments in R, which featured this little bit of code:> createCounter <- function(value) { function(i) { value <<- value+i} }> counter <- createCounter(0)> counter(1)> a <- counter(0)&gt...

Read more »

The importance of being unoriginal (and befriending google)

June 11, 2011
By

In search of bin countsI look at histograms and density functions of my data in R on a regular basis. I have some idea of the algorithms behind these, but I've never had any reason to go under the hood until now. Lately, I've been looking using the b...

Read more »