The ‘Big Analytics’ Revolution Starts with R: Webinar June 14

June 7, 2011
By

On Tuesday next week I'll be teaming up with Revolution Analytics' Mike Minelli to give a 30-minute webinar to introduce executives to R, Big Data, and applications of advanced analytics. If there's someone in your company who needs to know about the impact of R on getting value out of data, they can register here. Here's the agenda: The...

Read more »

R books are now showing up in the dollar bin. That’s a good…

June 7, 2011
By
R books are now showing up in the dollar bin. That’s a good…

R books are now showing up in the dollar bin. That’s a good sign!

Read more »

K-Means Clustering on Big Data

June 7, 2011
By
K-Means Clustering on Big Data

In this post Joseph Rickert demonstrates how to build a classification model on a large data set with the RevoScaleR package. A script file for use with Revolution R Enterprise to recreate the analysis below is at the end of the post, and can also be downloaded here -- ed. The k-means (Lloyd) algorithm, an intuitive way to explore...

Read more »

The pros and cons of robust data characterizations

The pros and cons of robust data characterizations

Over the years, I have looked at a lot of data contaminated with outliers, the subject of Chapter 7 of Exploring Data in Engineering, the Sciences, and Medicine.  That chapter adopts the definition of an outlier presented by Barnett and Lewis in their book Outliers in Statistical Data 2nd Edition

Read more »

Fittesmodel.com: A user-friendly way to conduct empirical research together

June 6, 2011
By

(A guest post by Camiel de Koning) ————– When trying to replicate, verify or extend empirical research of others, a researcher generally encounters many time-consuming barriers and there are often many prerequisites. Fittestmodel has the objective to overcome many of these problems, by presenting a webapplication that allows users to: use but not having to install R. quickly incorporate...

Read more »

R for Data Mining

June 6, 2011
By

Statistics and data mining often get bundled together, but (in my opinion), they're generally different practices with different goals. As a language designed for statistics, much of R's core functionality is focused on exploring and understanding data: model design, inference, and visualization. But when your goal is simply to get the best predictions from a big data set (without...

Read more »

In case you missed it: May Roundup

June 6, 2011
By

In case you missed them, here are some articles from May of particular interest to R users. A review of "R Cookbook", a new how-to book for R programmers. A detailed example of using the RevoScaleR package to analyze a large airline data set. A new guide for R beginners, "How to Learn R", provides links to R resources,...

Read more »

Shared Ecological Modelling References

June 6, 2011
By

05.06.2011 Today i started to create a list of books and articles about ecological modelling. In this list you will not only find general books about modelling but also books about spatial analysis, image analysis and other (in my opinion) important techniques useful in the context of ecological modelling. For the collection i use “Zotero”

Read more »

10 R One Liners to Impress Your Friends

June 5, 2011
By

Following the trend of one liners for various languages (Haskell, Scala, Python), here's some examples in RMultiply Each Item in a List by 2#listslapply(list(1:4),function(n){n*2})# otherwise(1:4)*2 Sum a List of Numbers#listslapply(list(1:4),sum)# oth...

Read more »

Conway’s Game of Life in R with ggplot2 and animation

June 5, 2011
By

In undergrad I had a computer science professor that piqued my interest in applied mathematics, beginning with Conway’s Game of Life. At first, the Game of Life (not the board game) appears to be quite simple — perhaps, too simple — but it has been widely explored and is useful for modeling systems over time. It has been...

Read more »

An application of aggregate() and merge()

June 5, 2011
By
An application of aggregate() and merge()

Today, I encountered an interesting problem while processing a data set of mine. My data have observations on businesses that are repeated over time. My data set also contains information on longitude and latitude of the business location, but unfort...

Read more »

Conway’s Game of Life in R with ggplot2 and animation

June 5, 2011
By
Conway’s Game of Life in R with ggplot2 and animation

In undergrad I had a computer science professor that piqued my interest in applied mathematics, beginning with Conway’s Game of Life. At first, the Game of Life (not the board game) appears to be quite simple — perhaps, too simple — but it has been widely explored and is useful for modeling systems over time.

Read more »

Testing Different Methods for Merging a set of Files into a Dataframe

June 5, 2011
By
Testing Different Methods for Merging a set of Files into a Dataframe

I previously posted a method I used for merging a set of files into a dataframe. It wasn’t long before …Continue reading »

Read more »

Environments in R

June 4, 2011
By
Environments in R

One interesting thing about R is that you can get down into the insides fairly easily. You're allowed to see more of how things are put together than in most languages. One of the ways R does this is by having first-class environments.At first glance, environments are simple enough. An environment...

Read more »

Don Quijote — Word Statistics

June 4, 2011
By
Don Quijote — Word Statistics

Using the Gutenberg Project’s free text of Don Quijote + Unix for Poets, here are the most used (non-short) words in Miguel de Cervantes’ famous work: 2167 Quijote 2145 Sancho 1331 porque 1053 respondió 1027 había  900 merced  813 vuestra  79...

Read more »

Don Quijote — Word Statistics

June 4, 2011
By
Don Quijote — Word Statistics

Using the Gutenberg Project’s free text of Don Quijote + Unix for Poets, here are the most used (non-short) words in Miguel de Cervantes’ famous work: 2167 Quijote 2145 Sancho 1331 porque 1053 respondió 1027 había  900 merced  813 vuestra  79...

Read more »

searching ITIS and fetching Phylomatic trees

June 3, 2011
By
searching ITIS and fetching Phylomatic trees

I am writing a set of functions to search ITIS for taxonomic information (more databases to come) and functions to fetch plant phylogenetic trees from Phylomatic. Code at github.Also, see the examples in the demos folder on the Github site above.

Read more »

Visualizing small-scale paired data – combining boxplots, stripcharts, and confidence-intervals in R

June 3, 2011
By
Visualizing small-scale paired data – combining boxplots, stripcharts, and confidence-intervals in R

Sometimes when working with small paired data-sets it is nice to see/show all the data in a structured form. For example when looking at pre-post comparisons, connected dots are a natural way to visualize which data-points belong together. In R this can be easily be combined with boxplots expressing the overall distribution of the data.  This

Read more »

Using R for Stata to CSV Conversion

June 3, 2011
By

I recently found myself in the unpleasant situation of needing to read a Stata .dta file, but not having Stata readily available to me. Normally, I’d fire up a text editor and deconstruct the file, except Stata saves its data in a proprietary Binary format, meaning it garbles some of the content of the file.

Read more »

Example 8.39: calculating Cramer’s V

June 3, 2011
By
Example 8.39: calculating Cramer’s V

Cramer's V is a measure of association for nominal variables. Effectively it is the Pearson chi-square statistic rescaled to have values between 0 and 1, as follows:V = sqrt(X^2 / )where X^2 is the Pearson chi-square, n...

Read more »

Simulating CMYK mis-registration printing

June 3, 2011
By
Simulating CMYK mis-registration printing

I recently came across a poster advertising a children's production of Shakespeare's The Tempest where they purposely used an effect to mimic a mis-registration in CMYK printing. You have probably seen this before as a slight offset in one of t...

Read more »

The residuals of crime

June 3, 2011
By
The residuals of crime

Real-estate search website Trulia has a new tool to help you in your choice of a new home: crime maps. With local police forces being much better about sharing data crime maps are nothing new, but Trulia takes it to the next level with a slick user interface for navigating US cities, a beautiful heat-map visualization of crime hot-spots...

Read more »

Always learn and never know

June 3, 2011
By
Always learn and never know

I have been using R for about two years, with no previous coding background. So, I feel like the title says, “always learn and never know”. This time, I decided to use R to study a simple, non-statistical problem that came up some time ago. Suppose the exponential function 2^x and the parabola x^2. One

Read more »

Merge all files in a directory using R into a single dataframe

June 3, 2011
By
Merge all files in a directory using R into a single dataframe

In this post, I provide a simple script for merging a set of files in a directory into a single, …Continue reading »

Read more »

Optmatch and RItools — New homes and techniques

June 2, 2011
By

Co-developers Jake Bowers, Ben Hansen and I are happy to announce that our R packages optmatch and RItools have new homes on GitHub. We had previously been managing development on private subversion repositories and managed the projects through an ad-h...

Read more »

A Quantstrat to Build On

June 2, 2011
By
A Quantstrat to Build On

THIS IS NOT INVESTMENT ADVICE.  PLEASE DO NOT TRADE THIS SYSTEM AS IT CAN LOSE SIGNIFICANT AMOUNTS OF MONEY.  YOU ARE RESPONSIBLE FOR YOUR OWN GAINS AND LOSSES. Some R finance powerhouses have been banging away on the quantstrat package for q...

Read more »

Highlights from R/Finance 2011 presentations

June 2, 2011
By

Patrick Burns offers his selections from the presentations at the R/Finance 2011 conference. Check out his post for overviews of some great presentations (and truly, there's some awesome content available to download). I'll add another of my favourites: Bryan Lewis's presentation of his interface from R to the betfair betting market. (But if you use it to automate bets...

Read more »

Australians and Americans, 10 years after 9/11

June 2, 2011
By

With Lynn Vavreck at UCLA, I ran parallel public opinion surveys in Australia and the United States, measuring attitudes on security, the fight against terrorism, the wars in Afghanistan etc, some 10 years after the 9/11 attacks. Full report here (gene...

Read more »

Sweave diagram, following Knuth’s original

June 2, 2011
By
Sweave diagram, following Knuth’s original

In preparation for a talk, I updated Knuth's original diagram in Donald E. Knuth. Literate programming. The Computer Journal, 27(2):97–111, May 1984. The new diagram is Sweave specific. Click the Sweave diagram for a PDF version, or right-click and select 'save image as' for the PNG version. Permission is granted for any use of the

Read more »