Extracting EOD Data from NSE

July 19, 2011
By
Extracting EOD Data from NSE

My prime interest being the Indian financial markets, the first step would be to get the data to play around. NSE India provides EOD of data as bhavcopies. The same are stored as zipped files at their servers. Downloading them one by one for a larger t...

Read more »

Geocoding addresses from Missouri Sex Offender Registry

July 19, 2011
By

Computer Assisted Reporting This is the second of four articles about analyzing distances between sex offenders and child daycare centers in Missouri as part of a joint project with KSHB NBC Action News in Kansas City. The previous article gave details...

Read more »

Analysis of Missouri Sex Offender Registry Data

July 18, 2011
By

Computer Assisted Reporting This is the first of three articles about analyzing distances between sex offenders and child daycare centers in Missouri as part of a joint project with KSHB NBC Action News in Kansas City. The Missouri State Highway Patrol...

Read more »

The foundations of Statistics [reply]

July 18, 2011
By
The foundations of Statistics [reply]

Shravan Vasishth has written a response to my review both published on the Statistics Forum. His response is quite straightforward and honest. In particular, he acknowledges not being a statistician and that he “should spend more time studying statistics”. I also understand the authors’ frustration at trying “to recruit several statisticians (at different points) to

Read more »

GigaOm article on R, Big Data and Data Science

July 18, 2011
By

I'm really pleased that an article I wrote, "5 real-world uses of big data", has been published in the widely-read technology blog GigaOm. In the article, I review five examples of using data science techniques and R to make sense of some large real-world data sets: Drew Conway's analysis of the Afghanistan attacks data released by Wikileaks Benetech's use...

Read more »

Registration closing for UseR! 2011

July 18, 2011
By
Registration closing for UseR! 2011

Friday July 22 is the last day on which you can register for UseR! 2011 at the University of Warwick.  The conference will be 2011 August 16-18. You can peruse the book of abstracts and view the draft schedule. I am scheduled to give a talk on “Random input testing with R”.  The abstract is: … Continue reading...

Read more »

Model Validation: Interpreting Residual Plots

July 18, 2011
By
Model Validation: Interpreting Residual Plots

When conducting any statistical analysis it is important to evaluate how well the model fits the data and that the data meet the assumptions of the model. There are numerous ways to do this and a variety of statistical tests to evaluate deviations from model assumptions. However, there is little general acceptance of any of the statistical tests. Generally...

Read more »

Example 9.3: augmented display of contingency table

July 18, 2011
By
Example 9.3: augmented display of contingency table

SAS and R often provide different levels of details from output. This is particularly true for the descriptive analysis of contingency tables, where SAS makes it easy to display tables with additional quantities (such as the observed cell count).The m...

Read more »

The Road to Default: Puppy Power!

July 18, 2011
By
The Road to Default: Puppy Power!

Although Congress can technically dilly dally until August 2nd to come up with an agreement and raise the debt ceiling- markets have anticipated the inevitable. They haven't sat back and decided to wait till August 2nd to panic- they are already in "oh...

Read more »

Fast logistic regression on Big Data with commodity hardware? No problem.

July 18, 2011
By

You might think that doing advanced statistical analysis on Big Data is out of reach for those of us without access to expensive hardware and software. For example, back in April SAS was proud to demonstrate being able to run logistic regression on a billion records (and "just a few" variables) in less than 80 seconds. But that feat...

Read more »

Avoiding Loops in R: An Example with Principal Minors

July 18, 2011
By
Avoiding Loops in R: An Example with Principal Minors

Yesterday, I found myself wanting to compute a large subset of the second order principal minors of a matrix (diagonal-preserving minors; the ones for which the rows and columns kept are the same). Don't judge me for wanting to do this, and bear with ...

Read more »

1st Data Analysis Contest Using R

1st Data Analysis Contest Using R

Emilio Torres Manzanera has just announced the 1st Data Analysis Contest Using R: “Nestoria (http://www.nestoria.com/) is a specialized web search engine platform in house prices. Nestoria and Lokku Labs aim to improve the understanding of the public of the value of its databases. The company aims to engage a few brilliant statisticians in the expectation

Read more »

On “Stock correlation has been rising”

July 17, 2011
By
On “Stock correlation has been rising”

Ticker Sense posted about the mean correlation of the S&P 500. The plot there — similar to Figure 1 — shows that correlation has been on the rise after a low in February. Figure 1: Mean 50-day rolling correlation of S&P 500 constituents to the index. For me, this post raised a whole lot more … Continue reading...

Read more »

The method in the mirror: reflection in R

July 17, 2011
By
The method in the mirror: reflection in R

Reflection is a programming concept that sounds scarier than it is. There are three related concepts that fall under the umbrella of reflection, and I’ll be surprised if you haven’t come across most of these code ideas already, even if you didn’t know it was called reflection. The first concept is examination of your variables.

Read more »

Correlation Resources: SPSS, R, Causality, Interpretation, and APA Style Reporting

July 17, 2011
By

This post provides links to a range of resources related to the use and interpretation of correlations. I wanted to provide a page with links to a number of additional resources that would be useful both for those of my students who might be keen to le...

Read more »

Migrating from SPSS/Excel to R, Part 2: Working with Packages

July 17, 2011
By
Migrating from SPSS/Excel to R, Part 2: Working with Packages

In this post, I cover an important aspect of using R that users of SPSS/Excel won’t be familiar with: working …Continue reading »

Read more »

Accepted lack of confidence

July 17, 2011
By
Accepted lack of confidence

I just got the following email from PNAS about our Lack of confidence in ABC model choice. Editor's Remarks to Author: both referees now find the manuscript acceptable for publication as do I. Each suggests small changes which I encourage the authors to make prior to having the manuscript go into production. Congratulations on an

Read more »

The Road to Default: Deep DooDoo

July 16, 2011
By
The Road to Default: Deep DooDoo

Okay so what is the situation at hand? Well the inevitable default of the United States of course.  The U.S. will default regardless of whether Congress raises the debt ceiling. You may be thinking the following: But how can he say such a thing? I...

Read more »

Mixture distributions and models: a clarification

Mixture distributions and models: a clarification

In response to my last post, Chris had the following comment:           I am actually trying to better understand the distinction between mixture models and mixture distributions in my own work.  You seem to say mixture models apply to a small set of models – namely regression models.This comment suggests that my caution about the difference between mixed-effect models and mixture distributions...

Read more »

Slopegraphs in R

July 16, 2011
By
Slopegraphs in R

The internet seems abuzz this week with the "discovery" of a long-lost Edward Tufte plot type: the slopegraph. In this post, I'll show you how to create these elegant compact plots using R and ggplot2.

Read more »

Review: Financial Risk Forecasting – The Theory and Practice of Forecasting Market Risk, with Implementation in R and MATLAB by Jon Danielsson

July 16, 2011
By

Guest post to R-bloggers by Dr Kris Boudt. ——————– R has always been my favorite language to forecast financial risk in my research and consulting. But, I have been reluctant to use it in my lectures on financial risk. It is certainly not the absence of appropriate R packages that refrained me. On the contrary, there is a large...

Read more »

test_that — A brief review

July 15, 2011
By

For the last month or so, I have been using the test_that unit testing package for R (a quick note on names: both testthat and test_that are used in the documentation. The library, as available from CRAN has no underscore, so use install.packages('testthat') to get a copy). My free-time programming is always written a loosely TDD style,...

Read more »

10 reasons why a grad student should use R

July 15, 2011
By

Kevin Goulding is working on a Master’s degree in Applied Economics at Montana State University, and offers 10 reasons why grad students should choose R for statistical analyses, homework problems, and thesis research: R is free, and lets grad students escape the burdens of commercial license costs. R has really good online documentation; and the community is unparalleled. The...

Read more »

Using R and Motion Charts to analyze financial data

July 15, 2011
By
Using R and Motion Charts to analyze financial data

We've noted before that with the RGoogleVis package, it's easy to make motion charts in R, and create a web-based interactive chart that reflects the synchronous movements of two or three variables over time. R user Jeffrey Breen has a great new blog post showing exactly how easy it is, which is best summarized in this tweet: I wanted...

Read more »

Using R for multilevel modeling; Calling C from R

July 15, 2011
By

Los Angeles R users group July 12 2011 meeting (see meetup info here): 1. Using R for multilevel modeling of salmon habitat, by Yasmin Lucero download slides from here 2. Calling C from R, by Rob Zinkov download slides from ]....[

Read more »

ICD code – search looping

July 15, 2011
By
ICD code – search looping

Following on from my earlier post on creating a table of ICD codes in R, here is how I am currently counting these codes and storing the codes in a dataframe: Firstly create a dataframe to store the results in: hosp_count <- as.data.frame(matrix(ncol=length(icd_codes))) names(hosp_count) <- names(icd_codes) Counting Occurences: Then start to loop through your dataset with

Read more »

One-liners which make me love R: Make your data dance (Hans Rosling style) with googleVis #rstats

July 14, 2011
By
One-liners which make me love R: Make your data dance (Hans Rosling style) with googleVis #rstats

This inaugural post in my "one-liners which make me love R" series highlights the googleVis package which makes it easy to use the Google Visualization API from R. Thanks to googleVis, just one line of R generates the 165 lines of HTML and (mostly) JavaScript required to create a Hans Rosling-style motion chart for some sample data.

Read more »

MCMC and faster Gibbs Sampling using Rcpp

July 14, 2011
By

Sanjog Misra, who uses Rcpp for Monte Carlo Markov Chain (MCMC) analyses in quantitative marketing, kindly set me a short example of Rcpp use. The example is based on a blog post by Darren Wilkinson which itself discusses and compares the suitabilit...

Read more »

What is your favorite R feature? (part 2)

What is your favorite R feature? (part 2)

This week in our blog we started a list of great R code (www.r-project.org) snippets: http://cloudnumbers.com/what-is-your-favorite-r-feature We are going to extend this list with several more nice R features. Please feel free to add comments with your favorite R code snippets. Descriptive statistics: A huge set of tools to describe and explore data is available

Read more »