Monthly Archives: June 2013

You Do Not Need to Tell Me I Have A Typo in My Documentation

June 10, 2013
By
You Do Not Need to Tell Me I Have A Typo in My Documentation

So I just got yet yet another comment saying "you have a typo in your documentation". While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for y...

Read more »

Using Metadata to find Paul Revere

June 9, 2013
By
Using Metadata to find Paul Revere

London, 1772. I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty's subjects. This is in connection with the discussion of the role of "metadata" in

Read more »

Why are Birds Dinosaurs?

June 9, 2013
By
Why are Birds Dinosaurs?

Month after month, one of the most popular posts on the Paleocave blog is the How to Read a Cladogram post I did some time ago. I always intended to follow it up with more cladistic fun. So, hold onto your butts, we’re going to let the dinosaurs loose. Birds are dinosaurs. We’ve all heard

Read more »

Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1)

June 9, 2013
By
Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1)

Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1) The ‎ National Day of Civic Hacking took place …Continue reading »

Read more »

Improve The Efficiency in Joining Data with Index

June 9, 2013
By
Improve The Efficiency in Joining Data with Index

When managing big data with R, many people like to use sqldf() package due to its friendly interface or choose data.table() package for its lightening speed. However, very few would pay special attentions to small details that might significantly boost the efficiency of these packages by adding index to the data.frame or data.table. In my

Read more »

Mahout for R Users

June 9, 2013
By
Mahout for R Users

I have a few posts coming up on Apache Mahout so I thought it might be useful to share some notes. I came at it as primarily an R coder with some very rusty Java and C++ somewhere in the back of my head so that will be my point of reference. I’ve also included … Continue reading...

Read more »

How to read quickly large dataset in R?

June 9, 2013
By

Medal Allocations at the Comrades Marathon

June 9, 2013
By
Medal Allocations at the Comrades Marathon

Following up on my previous post regarding attrition rates at Comrades Marathon 2013, here are the statistics I have gathered for medal allocations. There is some interesting history behind the Comrades Marathon medals. For reference, the medals are allocated as follows: Gold medals to the first ten finishers in the men’s race and the ladies’ race;

Read more »

Exploratory Data Analysis: Kernel Density Estimation in R on Ozone Pollution Data in New York and Ozonopolis

Exploratory Data Analysis: Kernel Density Estimation in R on Ozone Pollution Data in New York and Ozonopolis

Introduction Recently, I began a series on exploratory data analysis; so far, I have written about computing descriptive statistics and creating box plots in R for a univariate data set with missing values.  Today, I will continue this series by analyzing the same data set with kernel density estimation, a useful non-parametric technique for visualizing

Read more »

Quartiles, Deciles, and Percentiles

June 9, 2013
By

The measures of position such as quartiles, deciles, and percentiles are available in quantile function. This function has a usage,where:x - the data pointsprob - the location to measurena.rm - if FALSE, NA (Not Available) data points are not ignoredna...

Read more »