Top 50 Statistics blogs

October 10, 2011
By

TheBestColleges.org has just published their list of the "Top 50 Statistics Blogs of 2011", and I'm pleased say that not only did our own Revolutions blog make the list, but it's in fine company with some truly excellent blogs. Several of my personal favourites made the list, including: Guardian columnist Ben Goldacre's Bad Science blog The Dataists, a blog...

Read more »

Upgrading R (and packages)

October 10, 2011
By

I tend not to upgrade R very often—running from 6 months to 1 year behind in version numbers—because I had to reinstall all packages: a real pain. A quick search shows that people have managed to come up with good … Continue reading →

Read more »

An exercise in plyr and ggplot2 using triathlon results

October 10, 2011
By
An exercise in plyr and ggplot2 using triathlon results

I ran my last triathlon for this year a couple of weeks ago, in the beautiful town of Stratford-upon-Avon. The results were online the day after so I decided to have a look at my fellow competitors’ times, which gave … Continue reading →

Read more »

Artist view of crimes in London

October 10, 2011
By
Artist view of crimes in London

At first sight, one could think this picture is a scale model of some narrow moutains, like Bryce Canyon… Actually it represents crimes in East London, an cardboard artwork by the Londoner artist Abigail Reynolds, called Mount Fear.  Here is what can be read on the artist’s webpage: The terrain of Mount Fear is generated

Read more »

single-column data frame

October 10, 2011
By

This is a trivial but very useful tip:> x=data.frame(a=1:4, c=5)> x a c1 1 52 2 53 3 54 4 5> x a c1 1 5> x 1 2 3 4> x a1 12 23 34 4where you can see that:to avoid a become a vector, rather than a...

Read more »

k-mean clustering + heatmap

October 10, 2011
By

If you want more info about clustering, I have another post about "Clustering analysis and its implementation in R". Here is the link:  http://onetipperday.blogspot.com/2012/04/clustering-analysis-2.html------------Several R functions in this...

Read more »

Reading HTML pages in R for text processing

October 10, 2011
By

We were talking with one of my colleagues about doing some text analysis—that, by the way, I have never done before—for which the first issue is to get text in R. Not any text, but files that can be accessed … Continue reading →

Read more »

An R function to determine if you are a data scientist

October 10, 2011
By
An R function to determine if you are a data scientist

“Data scientist” is one of the buzzwords in the running for rebranding applied statistics mixed with some computing. David Champagne, over at Revolution Analytics, described the skills for being a data scientist with a Venn Diagram. Just for fun, ...

Read more »

Plot Animation with Imported Images

October 10, 2011
By
Plot Animation with Imported Images

...I really dig the animation package! ..so here's the outcome of my firsts encounters with saveHTML() - I produced an animation with pre-existing images by utilizing the functions readJPEG() and rasterImage() from the R-packages jpeg and ReadImage...

Read more »

How can you do a smart job getting data from internet?

October 9, 2011
By

I’d like to explore more the capabilities of my statistical packages to get data online and allocate it in memory instead of download each dataset by hand. After all, I found this task is pretty easy, but got me out of bed for one night trying to find the most efficient way to loop across

Read more »

The Skills of a Data Miner

October 9, 2011
By

Data mining is not only statistics, even if statistics is the most recognized academic component of it. It also includes data cleaning, machine learning and data visualization. The scarce factor is the ability to understand that data and extract value ...

Read more »

Equality of Covariances Matrices Test in R (varcomp)

October 9, 2011
By
Equality of Covariances Matrices Test in R (varcomp)

This is a piece of code I implemented in 2004, which was supposed to be part of an R-package in multivariate testing (to be named, rather creatively, mvttests). Time has flown, I haven’t still got around to implementing the said package, but people keep asking me for the varcomp function, so here it is, for

Read more »

understanding computational Bayesian statistics

October 9, 2011
By
understanding computational Bayesian statistics

I have just finished reading this book by Bill Bolstad (University of Waikato, New Zealand) which a previous ‘Og post pointed out when it appeared, shortly after our Introducing Monte Carlo Methods with R. My family commented that the cover was nicer than those of my own books, which is true. Before I launch into

Read more »

Sphericity Test for Covariance Matrices in R (sphericity.test)

October 9, 2011
By
Sphericity Test for Covariance Matrices in R (sphericity.test)

This is a piece of code I implemented in 2004, which was supposed to be part of an R-package in multivariate testing (to be named, rather creatively, mvttests). Time has flown, I haven’t still got around to implementing the said package, but people keep asking me for the sphericity.test function, so here it is, for

Read more »

Operating on datasets inside a function

October 9, 2011
By

There are times when we need to write a function that makes changes to a generic data frame that is passed as an argument. Let’s say, for example, that we want to write a function that converts to factor any … Continue reading →

Read more »

Open Street maps

October 8, 2011
By
Open Street maps

There have been some exciting developments in the Deducer ecosystem over the summer which should go into CRAN release in the next few months. Today I'm going to give a quick sneak peek at an Open Street Map - R connection with accompanying GUI. This post will just show the non-GUI components. The first part of the

Read more »

Performance difference between Stata and R

October 8, 2011
By

With respect to multinomial logit model, the performance difference between the two packages are quite large, based on this post.

Read more »

Some light data munging with R, with an application to ranking NFL Teams

October 8, 2011
By
Some light data munging with R, with an application to ranking NFL Teams

I recently submitted this blog to R-bloggers, which aggregates R-related blog posts. It's a fantastic site and has been invaluable to me as I've learned R. One of my favorite kinds of articles is the hands-on, "hello world"-style weekend project t...

Read more »

Visualizing GIS data with R and Open Street Map

October 8, 2011
By
Visualizing GIS data with R and Open Street Map

In this post I way to share with you some code to use Openstreetmap – maps as a backdrop for a data visualization. We will use the RgoogleMaps-package for R. In the following I will show you how to make this graph. 1. Download the map I wanted to take a closer look at an

Read more »

A brief idea of style

October 8, 2011
By

Once one starts writing more R code the need for consistency increases, as it facilitates managing larger projects and their maintenance. There are several style guides or suggestions for R; for example, Andrew Gelman’s, Hadley Wickham’s, Bioconductor’s and this one. … Continue reading →

Read more »

Using Sweave

October 8, 2011
By

If you use R and haven’t discovered Sweave then go and find out about it. It enables R code and plots to be incorporated into a document so the analysis and report can be combined together in a single document. … Continue reading →

Read more »

R Graph Gallery widget in R Bloggers

October 8, 2011
By
R Graph Gallery widget in R Bloggers

The R Bloggers website, maintained by Tal Galili, aggregates blogs (including mine) from many people of the R community. Tal and I have been wondering about how to tight R Bloggers with the gallery, supporting each other's website. To that extent...

Read more »

Risk, Return and Analyst Ratings

October 7, 2011
By
Risk, Return and Analyst Ratings

Today I want to discuss a connection between Risk, Return and Analyst Ratings. Let’s start with defining our universe of stocks : 30 stocks from Dow Jones Industrial Average (^DJI) index. For each stock I will compute the number of Upgrades and Downgrades, Risk, and Return in 2010:2011. I will run a linear regression and

Read more »

Because it’s Friday: Reviews of Random Digits

October 7, 2011
By

If you dig around enough on Amazon.com, you can find some pretty odd products (like the Badonkadonk tank now sadly unavailable). Attached to these products you can often find a new form of comedy: the funny Amazon review. The products that attract such attention can be hard to fathom: this gallon of milk has more than 1,000 reviews. (Sample:...

Read more »

All combinations for levelplot

October 7, 2011
By
All combinations for levelplot

In a previous post I explained how to create all possible combinations of the levels of two factors using expand.grid(). Another use for this function is to create a regular grid for two variables to create a levelplot or a … Continue reading →

Read more »

In case you missed it: September Roundup

October 7, 2011
By

In case you missed them, here are some articles from September of particular interest to R users. The deadline to enter the "R Applications" contest with $20,000 in prizes is October 31. The RHadoop Project, a new collection of open-source R packages from Revolution Analytics, makes it possible to write map-reduce jobs in R to analyze huge data sets...

Read more »

R Workshop: Reading in Large Data Frames

October 7, 2011
By

 One question I get a lot about how to read large data frames into R. There are some useful tricks that can save you both time and memory when reading large data frames but I find that many people are not aware of them. Of course, your ability to read...

Read more »

When Wellington meets the “animation” package

October 7, 2011
By
When Wellington meets the “animation” package

The “animation” package is great for creating .gif files (of course, it also produces video and flash files thanks to Yihui Xie). By using this package, I would like to show you a nice spot in Wellington, NZ. At this … Continue reading →

Read more »

FFT / Power Spectrum Box-and-Whisker Plot with Gggplot2

October 6, 2011
By

I have a bunch of time series whose power spectra (FFT via R's spectrum() function) I've been trying to visualize in an intuitive, aesthetically appealing way. At first, I just used lattice's bwplot, but the spacing of the X-axis here really matters. ...

Read more »