# Monthly Archives: August 2013

## Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R

$Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R$

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on 5-number summaries, which were previously mentioned in the post on descriptive statistics in this series.  I will define and calculate the 5-number summary in 2 different ways that are commonly used in R.  (It turns out that different methods arise from

## Identifying Potential Customers with Classification Techniques in R Language

August 12, 2013
By

Data mining techniques and algorithms such as Decision Tree, Naïve Bayes, Support Vector Machine, Random Forest, and Logistic Regression are “most commonly used for predicting a specific outcome such as response / no-response, high / medium / low-value customer, likely to buy / not buy.”1 In this article, we will demonstrate how to use R

## Time Series Decomposition

August 12, 2013
By

In the last post on the changepoint package, I concluded with a brief example of time series decomposition with the "decompose" command.  After further reading, I discovered the "stl" command, which to me appears a superior method.  STL stand...

## analyze the national plan and provider enumeration system (nppes) with r and monetdb

August 12, 2013
By

the national plan and provider enumeration system (nppes) contains information about every provider, insurance plan, and clearinghouse actively operating in the united states healthcare industry.  did i just see the ears of all the health workforce researchers in the room perk up?  it's freely downloadable, courtesy of the department of health and human services' implementation of the...

## Some belated spring cleaning

August 11, 2013
By

A very busy spring has transitioned into a very busy summer, so let me recap a few topics that probably deserve more time than I’ll give them here. Here are the things I’m overdue on, in no particular order: Publications In the March edition of the Journal of Risk, Kris Boudt, Brian Peterson and I

## Twitter Movie Review – Chennai Express

August 11, 2013
By

In the spirit of my first post (Pappu Vs. Feku) I will continue to explore the use of Twitter in providing an eye into the events of contemporary interest, and movies are certainly something that capture interest of a large majority of Indian audience. So I am looking at Chennai Express that released last week... Read More ...

## Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model: Part three.

August 11, 2013
By

In part one and part two of Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model I developed a model for the number of goals in football matches from five seasons of La Liga, the premier Spanish football league. I’m now reasonably happy with the model and want to use it to rank...

## Software carpentry

August 11, 2013
By

I would never call myself a programmer, but as an ecologists I manage moderately big and complicated datasets, and that require to interact with my computer to get the most of them. I self-taught most of the things I need … Continue reading →

## Finding Correlations in Data with Uncertainty

August 11, 2013
By

A week or so ago a colleague of mine asked if I knew how to calculate correlations for data with uncertainties. Now, if we are going to be honest, then all data should have some level of experimental or measurement error. However, I suspect that in the majority of cases these uncertainties are ignored when