## On the language of Mad Men

May 11, 2012
By

Turns out that Megan would have never gotten a callback for an audition. (Via Ben Schmidt.)

## An embarrassing admission; Copy pasting tables with text containing spaces from Excel to R

May 11, 2012
By

I can’t believe I didn’t learn how to do it earlier, but I never knew how to accurately copy tables from excel that had text with spaces in them, and paste into a data frame in R without generating confusion … Continue reading →

## Porting cdplot to ggplot2

May 11, 2012
By

Last week I published a post on plotting tables in ggplot2. So the next natural step is to port cdplot to allow simple visualization of categorical variables against a numerical predictor.First part of the story covers binary variables. In th...

## My day out at #osddmalaria

May 10, 2012
By

Finally, I get around to telling you that… …on Friday 24th February, I took a day out from my regular job to attend a meeting on Open Source Drug Discovery for Malaria. I should state straight away that whilst drug discovery and chem(o)informatics are topics that I find very interesting, I have no professional experience

## 90+ Two-Minute Videos on R

May 10, 2012
By

I highly recommend Anthony Damico's excellent two-minute videos on programming in R. You can find the full list of 90+ videos here. This is the first of the series, which tells you how to download and install R:More generally, Anthony's video collectio...

## In case you missed it: April 2012 Roundup

May 10, 2012
By

In case you missed them, here are some articles from April of particular interest to R users. Information Age published a feature article on R, describing how new graduates are driving adoption of R in industry. Bob Muenchen has updated his list of R package equivalents to SAS and SPSS procedures. A history of Data Science, including Bill Cleveland's...

## Discovering power laws and removing “shit”

May 10, 2012
By
$Discovering power laws and removing “shit”$

Imagine you perform a statistical analysis on a time series of stock market data. After some transformation, averaging, and “renormalization” you find that the resulting quantity, let’s call it , behaves as a function of time like . Since you are a physicist you get excited because you have just discovered a power law. Physicists

## Survey of Data Science / Analytics / Big Data / Applied Stats / Machine Learning etc. Practitioners

May 10, 2012
By

As I’ve discussed here before, there is a debate raging (ok, maybe not raging) about terms such as “data science”, “analytics”, “data mining”, and “big data”. What do they mean, how do they overlap, and perhaps most importantly, who are the people who work in these fields? Along with two other DC-area Data Scientists, Marck

## Simple Moving Average Strategy with a Volatility Filter: Follow-Up Part 3

May 10, 2012
By

In part 2, we saw that adding a volatility filter to a single instrument test did little to improve performance or risk adjusted returns. How will the volatility filter impact a multiple instrument portfolio? In part 3 of the follow up, I will evaluate the impact of the volatility filter on a multiple instrument test. … Continue reading...

## Should I adjust the Bias?

May 10, 2012
By

A bias or systematic error is quite common when monitoring predictions vs reference data. Anyway we must have certain control limits to decide if the Bias is significant or not. Procedures (as for example ISO 12099 )give details about how to calculate ...

## Criticism 1 of NHST: Good Tools for Individual Researchers are not Good Tools for Research Communities

May 10, 2012
By

Introduction Over my years as a graduate student, I have built up a long list of complaints about the use of Null Hypothesis Significance Testing (NHST) in the empirical sciences. In the next few weeks, I’m planning to publish a series of blog posts, each of which will articulate one specific weakness of NHST. The

## Photos of the first Milano R net meeting

May 10, 2012
By

Photos of the first Milano R net meeting Milano; May 8, 2012

## See R integrated with QlikView, Jaspersoft, Excel, and mobile apps

May 9, 2012
By

In yesterday's webinar, Revolution Analytics CTO David Champagne demonstrated how to integrate statistical graphics and analytic computations created using R software with a variety of third-party applications. In each case Revolution R Enterprise Server is running as a compute server to the client application, with R scripts launched on each user interaction via the RevoDeployR Web Services API. David...

## Use R! – Part 2

May 9, 2012
By

Here is a follow-up of my first post about using R. For our yearly KU Leuven Geology PhD Seminar (08-09/05/2012), I quickly pasted this script together from several examples I had run into in the past, as well as some things that I have been doing myse...

## The NFL: Pass or Lose

May 9, 2012
By

The rushing game is slowly disappearing. Heave That Sucker When it doubt, chunk the pigskin. Whether you like it or not, NFL (National Football League) teams are relying upon passing more and more. Looking at the above chart, the average p...

## Simple Spatial Correlograms for Cross-Country Analysis in R

May 9, 2012
By

Accounting for temporal dependence in econometric analysis is important, as the presence of temporal dependence violates the assumption that observations are independent units. Historically, much less attention has been paid to correcting for spatial dependence, which, if present, also violates this independence assumption. The comparability of temporal and spatial dependence is useful for illustrating why

## The first version of my “inference from iterative simulation using parallel sequences” paper!

May 9, 2012
By

From August 1990. It was in the form of a note sent to all the people in the statistics group of Bell Labs, where I’d worked that summer. To all: Here’s the abstract of the work I’ve done this summer. It’s stored in the file, /fs5/gelman/abstract.bell, and copies of the Figures 1-3 are on Trevor’s The post The...

## Book “R and Data Mining: Examples and Case Studies” on CRAN

May 9, 2012
By

by Yanchang Zhao, RDataMining.com My book in draft titled “R and Data Mining: Examples and Case Studies” is now available on CRAN at http://cran.r-project.org/other-docs.html. It is scheduled to be published by Elsevier in late 2012. Its latest version can be … Continue reading →

## data.table version 1.8.1 – now allowed numeric columns and big-number (via bit64) in keys!

May 9, 2012
By

This is a guest post written by Branson Owen, an enthusiastic R and data.table user. Wow, a long time desired feature of data.table finally came true in version 1.8.1! data.table now allowed numeric columns and big number (via bit64) in …Read more »

## The Epic Search for the Perfect R Text Editor

May 8, 2012
By

I can never seem to get exactly what I want from an R text editor. Let me correct that, I can never seem to get exactly what I want from an R text editor on a MAC. I used to use Tinn-R  which met most  my needs: Free,lightweight with ...

## The Epic Search for the Perfect R Text Editor

May 8, 2012
By

I can never seem to get exactly what I want from an R text editor. Let me correct that, I can never seem to get exactly what I want from an R text editor on a MAC. I used to use Tinn-R  which met most  my needs: Free,lightweight with ...

## Memory Management in R, and SOAR

May 8, 2012
By

The more I’ve worked with my really large data set, the more cumbersome the work has become to my work computer.  Keep in mind I’ve got a quad core with 8 gigs of RAM.  With growing irritation at how slow … Continue reading →

## Data Science Books for Computational Journalists

May 8, 2012
By

There are quite a few books out now on “data science”. I’ve picked out three that I think are the best place to start for computational journalists. First is Machine Learning for Hackers, by Drew Conway and John Myles White. The autho...

## R and Foursquare’s recommendation engine

May 8, 2012
By

Foursquare, the mobile location-sharing app (of which I'm a big fan), has an excellent recommondation system. Based on your recent checkins, places your friends found popular, and even the time of day, Foursquare Explore will recommend a great place for a sushi lunch, or the best place to buy new shoes. This presentation from Foursquare engineer Ben Lee shows...

## Mapping US Radiation Levels in R

May 8, 2012
By

I have posted previously about the open data available on Socrata (https://opendata.socrata.com/), and I was looking at the site again today when I stumbled upon a listing of levels of various radioactive isotopes by US city and state. The data is available at https://opendata.socrata.com/Government/Sorted-RadNet-Laboratory-Analysis/w9fb-tgv6 . You will need to click export, and then download it as a...

## Heartbeat of a Cycling City: Bixi data at Hack/Reduce

May 8, 2012
By

The recent Hack/Reduce hackathon in Montreal was a tonne of fun. Our team tackled a data set of consisting of Bixi (Montreal’s bicycle share system) station states at one minute temporal resolution. We used Hadoop and mapreduce to pull out some features of user behaviours. One of the things we extracted was the flux at

## chartsnthings !

May 8, 2012
By

Yair pointed me to this awesome blog of how the NYT people make their graphs. This blows away all other stat graphics blogs (including this one). Lots of examples from mockup to first tries to final version. I recognize a lot of what they’re doing from my own experience. Also from my experience it’s hard The post chartsnthings...

## “Introduction to R” public course

May 8, 2012
By

Milano R net, in collaboration with Quantide, organizes an "Introduction to R" course Milano; June 7-8, 2012 Continue reading →