Contributions to the R Project

July 4, 2011
By

The R Project would not exist without the contributions of the R Core Group, the 21 volunteer statisticians and computer scientists from around the world who have donated their time and expertise to create the R langauge, its core packages, and manage its regular release and binary distribution process via the CRAN network. Many Core Group members also tirelessly...

Read more »

Questions about quantum computing

July 4, 2011
By

I read this article by Rivka Galchen on quantum computing. Much of the article was about an eccentric scientist in his fifties named David Deutch. I’m sure the guy is brilliant but I wasn’t particularly interested in his not particularly interesting life story (apparently he’s thin and lives in Oxford). There was a brief description

Read more »

Reverse Iteration

July 3, 2011
By
Reverse Iteration

Time to horrify some people. First let's include the code we wrote last time, > source("pretend.R") and the dependency-tracking environment it creates will be used to run all the following examples. Let's look at, I don't know, I'm just trying to demonstrate a language feature so uh... band-pass filtering Gaussian noise. Here's some noise:

Testing for valid variable names

July 3, 2011
By
Testing for valid variable names

I have something a fondness for ridiculous variable names, so it’s useful to be able to check whether my latest concoction is legitimate. More so if it is automatically generated. Not having an is_valid_variable_name function is one of those odd omissions from R, and the assign function doesn’t check validity. To recap, there are a

Read more »

Best graph ever

July 3, 2011
By
Best graph ever

Best graph ever. LARGEST EVER DIFFERENCE BETWEEN 328 and 327 SPOTTED IN NEW YORK CITY

Read more »

Learning SAS

July 3, 2011
By
Learning SAS

I want to learn the heavy-weight of Statistical softwares - SAS. It seems like the default choice for high-end statistics and I want to understand why.I'm working in the healthcare practice in our firm and want to analyze claims and credit data (Teraby...

Read more »

R performance optimization

July 3, 2011
By

The blog The Average Investors Blog R posted a nice report about accelerating a default Debian R installation and added some details about his benchmarks in the comment section

Read more »

Experimental reasoning in social science

July 2, 2011
By

As a statistician, I was trained to think of randomized experimentation as representing the gold standard of knowledge in the social sciences, and, despite having seen occasional arguments to the contrary, I still hold that view, expressed pithily by Box, Hunter, and Hunter (1978) that “To find out what happens when you change something, it

Read more »

GIS on a shoestring – Getting traveltimes from google

July 2, 2011
By

The analysis of geospatial information is currently a big trend in medicine and public health. Even though some may want to convince you that this can only be achieved with the latest and most expensive software, I am not convinced. First, analysis  of spatial data dates back to at least 1856 when John Snow investigated

Read more »

The R apply function – a tutorial with examples

July 2, 2011
By

Today I had one of those special moments that is uniquely associated with R. One of my colleagues was trying to solve what I term an 'Excel problem'. That is, one where the problem magically disappears once a programming language is employed. Put simpl...

Read more »

My own programming style convention for most languages

July 1, 2011
By

I write code mainly in R, and from times to times, in C, C++, SAS, bash, python, and perl. There are style guides out there that help make your code more consistent and readable to yourself and others. Here is a style guide for C++, and here is Google’s style guide for R and here... Read more »

Wikipedia for Kaggle Participants

July 1, 2011
By

Kaggle has released a new data-mining challenge: use data from 10 years of Wikipedia edits in order to predict future edit rates. The dataset has been anonymized in order to obscure editor identity and article identity, simultaneously adding focus to the challenge and robbing the dataset of considerable richness. I have some experience with wikipedia…

Read more »

B*tchin’ six dimensional 6-cube. The rainbow colours and…

July 1, 2011
By
B*tchin’ six dimensional 6-cube. The rainbow colours and…

B*tchin’ six dimensional 6-cube. The rainbow colours and glass panes really help this visualisation.  Examples of 6-dimensional things If it’s hard to envision 6 dimensions, consider this: the possible tunings of a guitar constitute a 6-dimensio...

Read more »

B*tchin’ six dimensional 6-cube. The rainbow colours and…

July 1, 2011
By
B*tchin’ six dimensional 6-cube. The rainbow colours and…

B*tchin’ six dimensional 6-cube. The rainbow colours and glass panes really help this visualisation.  Examples of 6-dimensional things If it’s hard to envision 6 dimensions, consider this: the possible tunings of a guitar constitute a 6-dimensio...

Read more »

How to find R experts on LinkedIn

July 1, 2011
By

If you're looking for connections with expertise in R programming, the new Skills and Expertise feature on LinkedIn makes it easy. Just visit the R Skills page for a list of R practitioners on LinkedIn. You can also add "R" to your own list of skills from the same page. You also might want to consider joining the R...

Read more »

Improving data quality with deducorrect

July 1, 2011
By
Improving data quality with deducorrect

Does your raw numerical data suffer from typos? sign errors? variable swaps? rounding errors? You may be able to fix all that with the deducorrect package. Today, we (that is Edwin de Jonge, Sander Scholtus and myself) uploaded the, 1.0-0 … Continue reading →

Read more »

Improving data quality with deducorrect

July 1, 2011
By
Improving data quality with deducorrect

Does your raw numerical data suffer from typos? sign errors? variable swaps? rounding errors? You may be able to fix all that with the deducorrect package. Today, we (that is Edwin de Jonge, Sander Scholtus and myself) uploaded the, 1.0-0 release to CR...

Read more »

Low-hanging R Optimizations on Ubuntu

July 1, 2011
By
Low-hanging R Optimizations on Ubuntu

A friend of mine brought my attention recently to the fact that the default R install is way to generic and thus sub-optimal. While I didn’t go all the way rebuilding everything from scratch, I did find a few cheap steps one can do to help things a little. Simply install the libatlas3gf-base package. That’s

Read more »

The stringr package just turned 0.5

July 1, 2011
By

(Re-posted from a post made by Hadley Wickham to the mailing list) # About the stringr package Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent...

Read more »

Weighting and prediction in sample surveys

July 1, 2011
By

A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is.I'll paste my discussion below, but it's worth reading the others' perspectives too. Especially...

Read more »

A third year of entries!

July 1, 2011
By
A third year of entries!

Contrary to previous reports, we started blogging after our book was published, with the conceit that we were adding examples to the book. Today marks the second anniversary of the book's appearance and of the blog. To celebrate, we're turning over o...

Read more »

RcppArmadillo 0.2.25

Following a series of pre-releases, Armadillo version 2.0.0 was announced by Conrad Sanderson earlier in the week. As it happens, it contained another minor build regression so version 2.0.1 followed the next day. We created versions 0.2.24 and 0.2...

Read more »

Last Time That Happened

June 30, 2011
By
Last Time That Happened

I bought a lottery ticket yesterday. I hardly ever buy them. The last time I did I lost a dollar. Actually, every time I've bought a ticket I've lost.  Yesterday I was at the local gas station in line behind a bloke who had a comprehensive folder ...

Read more »

Beating Kenneth French Small – High

June 30, 2011
By
Beating Kenneth French Small – High

With 148 pageviews over the last 24 hours, my post Kenneth French Gift to the Finance World has been popular relative to most of my other posts.  I think the popularity is due to Kenneth French’s notoriety and the amazing outperformance of Small...

Read more »

Mapping SNPs to Genes for GWAS Enrichment Analysis

June 30, 2011
By
Mapping SNPs to Genes for GWAS Enrichment Analysis

There are several tools available for conducting a post-hoc analysis of GWAS data looking for enrichment of significant SNPs using literature or pathway based resources. Examples include GRAIL, ALLIGATOR, and WebGestalt among others (see SNPath R Pac...

Read more »

R 2.13.1 scheduled for July 8

June 30, 2011
By

The R Core team announced today that the next update to R, version 2.13.1, will be released on July 8. Core team member Peter Dalgaard noted: The 2.13.0 release has been quite solid, but some people expect an x.y.1 to roll out on larger installations for the next academic year. Of course, there have also been a sampling of...

Read more »

Cash Might be Your Tail Risk

June 30, 2011
By
Cash Might be Your Tail Risk

Just like James Montier Ode to the Joy of Cash and David Merkel Got Cash?, I think cash is an extremely powerful tool.  Of the 3 ingredients (land, labor, and capital) of the economy, capital (cash) is most scarce at the end of a crisis or recessi...

Read more »

Calculate LCM of ‘n’ consecutive natural numbers using R

Calculate LCM of ‘n’ consecutive natural numbers using R

Well I shall hit the nail right on the head and not beat around the bush. I am taking programming lessons on R from my pro bro(Utkarsh Upadhyay) who agreed on teaching me only if I would disseminate my learning(a paranoia all the open-source advocates ...

Read more »

Don’t stop being a statistician once the analysis is done

June 30, 2011
By

I received an email from the Royal Statistical Society asking if I wanted to submit a 400-word discussion to the article, Vignettes and health systems responsiveness in cross-country comparative analyses by Nigel Rice, Silvana Robone and Peter C. Smith. My first thought was No, I can’t do it, I don’t know anything about health systems

Read more »