## Criticism 2 of NHST: NHST Conflates Rare Events with Evidence Against the Null Hypothesis

May 12, 2012
Introduction This is my second post in a series describing the weaknesses of the NHST paradigm. In the first post, I argued that NHST is a dangerous tool for a community of researchers because p-values cannot be interpreted properly without perfect knowledge of the research practices of other scientists — knowledge that we cannot hope

## The Foreign Language of ‘Mad Men’

May 12, 2012
The Foreign Language of 'Mad Men': ggplot2 in the Atlantic

## useR! 2012: Call for Late-Breaking Posters; REGULAR REGISTRATION ENDS 12May

May 12, 2012
*** Call for Late-breaking Posters *** Abstracts may be submitted for posters presenting recent developments and late-breaking applications of R, on topics as indicated in the earlier call for abstracts: http://biostat.mc.vanderbilt.edu/UseR-2012#Call_for_Abstracts_and_Tutorial Late-breaking posters will be displayed during the poster session alongside regular posters, and they will appear in the electronically published book of abstracts for the conference. However, these...

## ASA fellows

May 12, 2012
Being freshly elected ASA Fellow (yay!), I just received the list of 2012 ASA Fellows. Among whose, let me mention Sudipto Banerjee, University of Minnesota, Minneapolis, Minnesota, elected “For theoretical, methodological and applied research in spatiotemporal statistical modeling, especially as applied to problems in environmetrics, ecology, occupational health, agriculture and economics, for professional work at

## R – some introductory material

May 12, 2012
R is a statistical programming language and can be a little scary at first. I learned it during my first statistics class. While others used Stata, I decided to try if I could do the tasks in R. That was probably one of my best research-choices. My main source of knowledge was Quick-R that's an excellent resource. It...

## Mariano Rivera’s baseball prowess, illustrated with R

May 11, 2012
Kevin Quealy, graphics editor at the New York Times, has published another fascinating behind-the-scenes look at how the Times creates data visualizations for print and online. In his latest post, he looks at how a visualization of the Yankee's Mariano Rivera performance compared to other Major League Baseball pitchers was created. (Detail below, click for the full image.) The...

## On the language of Mad Men

May 11, 2012
Turns out that Megan would have never gotten a callback for an audition. (Via Ben Schmidt.)

## An embarrassing admission; Copy pasting tables with text containing spaces from Excel to R

May 11, 2012
I can’t believe I didn’t learn how to do it earlier, but I never knew how to accurately copy tables from excel that had text with spaces in them, and paste into a data frame in R without generating confusion … Continue reading →

## Porting cdplot to ggplot2

May 11, 2012
Last week I published a post on plotting tables in ggplot2. So the next natural step is to port cdplot to allow simple visualization of categorical variables against a numerical predictor.First part of the story covers binary variables. In th...

## My day out at #osddmalaria

May 10, 2012
Finally, I get around to telling you that… …on Friday 24th February, I took a day out from my regular job to attend a meeting on Open Source Drug Discovery for Malaria. I should state straight away that whilst drug discovery and chem(o)informatics are topics that I find very interesting, I have no professional experience

## 90+ Two-Minute Videos on R

May 10, 2012
I highly recommend Anthony Damico's excellent two-minute videos on programming in R. You can find the full list of 90+ videos here. This is the first of the series, which tells you how to download and install R:More generally, Anthony's video collectio...

## In case you missed it: April 2012 Roundup

May 10, 2012
In case you missed them, here are some articles from April of particular interest to R users. Information Age published a feature article on R, describing how new graduates are driving adoption of R in industry. Bob Muenchen has updated his list of R package equivalents to SAS and SPSS procedures. A history of Data Science, including Bill Cleveland's...

## Discovering power laws and removing “shit”

May 10, 2012
$Discovering power laws and removing “shit”$

Imagine you perform a statistical analysis on a time series of stock market data. After some transformation, averaging, and “renormalization” you find that the resulting quantity, let’s call it , behaves as a function of time like . Since you are a physicist you get excited because you have just discovered a power law. Physicists

## Survey of Data Science / Analytics / Big Data / Applied Stats / Machine Learning etc. Practitioners

May 10, 2012
As I’ve discussed here before, there is a debate raging (ok, maybe not raging) about terms such as “data science”, “analytics”, “data mining”, and “big data”. What do they mean, how do they overlap, and perhaps most importantly, who are the people who work in these fields? Along with two other DC-area Data Scientists, Marck

## Simple Moving Average Strategy with a Volatility Filter: Follow-Up Part 3

May 10, 2012
In part 2, we saw that adding a volatility filter to a single instrument test did little to improve performance or risk adjusted returns. How will the volatility filter impact a multiple instrument portfolio? In part 3 of the follow up, I will evaluate the impact of the volatility filter on a multiple instrument test. … Continue reading...

## Should I adjust the Bias?

May 10, 2012
A bias or systematic error is quite common when monitoring predictions vs reference data. Anyway we must have certain control limits to decide if the Bias is significant or not. Procedures (as for example ISO 12099 )give details about how to calculate ...

## Criticism 1 of NHST: Good Tools for Individual Researchers are not Good Tools for Research Communities

May 10, 2012
Introduction Over my years as a graduate student, I have built up a long list of complaints about the use of Null Hypothesis Significance Testing (NHST) in the empirical sciences. In the next few weeks, I’m planning to publish a series of blog posts, each of which will articulate one specific weakness of NHST. The

## Photos of the first Milano R net meeting

May 10, 2012
Photos of the first Milano R net meeting Milano; May 8, 2012

## See R integrated with QlikView, Jaspersoft, Excel, and mobile apps

May 9, 2012
In yesterday's webinar, Revolution Analytics CTO David Champagne demonstrated how to integrate statistical graphics and analytic computations created using R software with a variety of third-party applications. In each case Revolution R Enterprise Server is running as a compute server to the client application, with R scripts launched on each user interaction via the RevoDeployR Web Services API. David...

## Use R! – Part 2

May 9, 2012
Here is a follow-up of my first post about using R. For our yearly KU Leuven Geology PhD Seminar (08-09/05/2012), I quickly pasted this script together from several examples I had run into in the past, as well as some things that I have been doing myse...

## The NFL: Pass or Lose

May 9, 2012
The rushing game is slowly disappearing. Heave That Sucker When it doubt, chunk the pigskin. Whether you like it or not, NFL (National Football League) teams are relying upon passing more and more. Looking at the above chart, the average p...

## Simple Spatial Correlograms for Cross-Country Analysis in R

May 9, 2012
Accounting for temporal dependence in econometric analysis is important, as the presence of temporal dependence violates the assumption that observations are independent units. Historically, much less attention has been paid to correcting for spatial dependence, which, if present, also violates this independence assumption. The comparability of temporal and spatial dependence is useful for illustrating why

## The first version of my “inference from iterative simulation using parallel sequences” paper!

May 9, 2012
From August 1990. It was in the form of a note sent to all the people in the statistics group of Bell Labs, where I’d worked that summer. To all: Here’s the abstract of the work I’ve done this summer. It’s stored in the file, /fs5/gelman/abstract.bell, and copies of the Figures 1-3 are on Trevor’s The post The...

## Book “R and Data Mining: Examples and Case Studies” on CRAN

May 9, 2012
by Yanchang Zhao, RDataMining.com My book in draft titled “R and Data Mining: Examples and Case Studies” is now available on CRAN at http://cran.r-project.org/other-docs.html. It is scheduled to be published by Elsevier in late 2012. Its latest version can be … Continue reading →

## data.table version 1.8.1 – now allowed numeric columns and big-number (via bit64) in keys!

May 9, 2012
This is a guest post written by Branson Owen, an enthusiastic R and data.table user. Wow, a long time desired feature of data.table finally came true in version 1.8.1! data.table now allowed numeric columns and big number (via bit64) in …Read more »

## The Epic Search for the Perfect R Text Editor

May 8, 2012
I can never seem to get exactly what I want from an R text editor. Let me correct that, I can never seem to get exactly what I want from an R text editor on a MAC. I used to use Tinn-R  which met most  my needs: Free,lightweight with ...

## Memory Management in R, and SOAR

May 8, 2012
The more I’ve worked with my really large data set, the more cumbersome the work has become to my work computer.  Keep in mind I’ve got a quad core with 8 gigs of RAM.  With growing irritation at how slow … Continue reading →

## Data Science Books for Computational Journalists

May 8, 2012
There are quite a few books out now on “data science”. I’ve picked out three that I think are the best place to start for computational journalists. First is Machine Learning for Hackers, by Drew Conway and John Myles White. The autho...