Articles by Kevin Davenport

Absolute Deviation Around the Median

August 8, 2013 | Kevin Davenport

Median Absolute Deviation (MAD) or Absolute Deviation Around the Median as stated in the title, is a robust measure of central tendency. Robust statistics are statistics with good performance for data drawn from a wide range of non-normally distributed probability distributions. Unlike the standard mean/standard deviation combo, MAD is ... [Read more...]

The R User Conference 2013: Albacete, Spain

July 23, 2013 | Kevin Davenport

I was fortunate enough to attend the 2013 UseR! conference in Albacete, Spain this year. I had a great time meeting fellow R users and exchanging ideas on R implementations. The conference is also one of the few opportunities to gain exposure to uses of R in other disciplines because there ... [Read more...]

Shiny Server on CentOS

June 29, 2013 | Kevin Davenport

I’ve been enjoying working with Joe Cheng’s Shiny Server and wanted to create a quick step-by-step guide on installing it on an AWS CentOS EC2 instance as the standard Shiny Server instructions assume the typical dependencies are installed: 1. Shiny’s instructions say to install libssl-dev (sudo yum install ... [Read more...]

Data imputation I

June 12, 2013 | Kevin Davenport

I recently entered kaggle titanic learning competition for fun and to see where my out of the box utilization of random forest would rank me (303 out of 5,882). It was interesting to see that much of the scoring differentiation came from score imputation, that is filling missing values based on other ... [Read more...]

ggplot2 graphics in a loop

April 29, 2013 | Kevin Davenport

A client has a specific audit they perform quarterly across 200 of their manufacturing plants. The audit has 8 distinct sections examining the different areas of the plant (shipping, receiving, storage, packaging,etc.) Instead of having one cumulative final score, the audit displays a final score for each section. I wanted to ... [Read more...]

Predicting Dichotomous Outcomes I

April 14, 2013 | Kevin Davenport

We are trying to predict a dependent dichotomous variable (male/female, yes/no, like/dislike,etc) with independent “predictor” variables. Let’s say we want to determine whether or not an employee will quit based on the percentage of their tenure spent traveling. We assemble the data from HR and ... [Read more...]

Gradient Boosting: Analysis of LendingClub’s Data

April 8, 2013 | Kevin Davenport

An old 5.75% CD of mine recently matured and seeing that those interest rates are gone forever, I figured I’d take a statistical look at LendingClub’s data. Lending Club is the first peer-to-peer lending company to register its offerings as securities with the Securities and Exchange Commission (SEC). Their ... [Read more...]

Data visualization with R and ggplot2

March 28, 2013 | Kevin Davenport

I’m working on a one-hour ggplot2 lecture for the San Diego R users group, which I will post here when I’m done. I think there are many great intro to R data visualization resources out there so I’ll only share working examples on my blog. A retail ... [Read more...]

Samsung Phone Data Analysis Project

March 19, 2013 | Kevin Davenport

Below are my findings from the second data analysis project in Dr. Jeffery Leek’s John Hopkins Coursera class. Introduction I used the  “Human Activity Recognition Using Smartphones Dataset” (UCI, 2013) to build a model. This data  was recorded from a Samsung prototype smartphone with a built-in accelerometer. The purpose of ... [Read more...]

Layman’s Random Forests

March 18, 2013 | Kevin Davenport

I’m not a fan of the Top 40 style content on Quora, but a student in Dr. Leek’s Coursera class shared this absolute gem from Edwin Chen. I have not seen a better explanation: How do random forests work in layman’s terms? Suppose you’re very indecisive, so ... [Read more...]

Simple Count Probability

February 24, 2013 | Kevin Davenport

Data can take the form of counts: Compliments or complaints received Items returned Number of E. coli cases Data can also be expressed in rates: Percent of web traffic from a user permissions type Percent of businesses in a region passing a safety audit A random variable X has the ... [Read more...]

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)