Monthly Archives: April 2011

Merging Multiple Data Files into One Data Frame

April 24, 2011
By
Merging Multiple Data Files into One Data Frame

We often encounter situations where we have data in multiple files, at different frequencies and on different subsets of observations, but we would like to match them to one another as completely and systematically as possible. In R, the merge() comma...

Read more »

Merging Multiple Data Files into One Data Frame

April 24, 2011
By
Merging Multiple Data Files into One Data Frame

We often encounter situations where we have data in multiple files, at different frequencies and on different subsets of observations, but we would like to match them to one another as completely and systematically as possible. In R, the merge() comma...

Read more »

Of Height and Speed in Tennis, or Fuzziness and Techiness in College

April 24, 2011
By
Of Height and Speed in Tennis, or Fuzziness and Techiness in College

I thought of this after reading this post and perhaps also this one, one the Cheap Talk blog. Here's the puzzle: in general, being tall does not make you slow; but among professional tennis players, the tall athletes do tend to be relativel...

Read more »

Chop, Slice and Dice Your Returns in R

April 24, 2011
By
Chop, Slice and Dice Your Returns in R

I have a knife rack on my kitchen wall with all my kitchen knives easily identifiable and accessible. I also have small scars on my hand where each knife can claim to have left a mark. It's not the knife's fault, of course. They hardly like being sudde...

Read more »

RcppArmadillo 0.2.19

April 24, 2011
By

Last Monday, Conrad Sanderson released version 1.2.10 of his most excellent Armadillo templated C++ library for linear algebra; I followed up the same day with version 0.2.19 of our RcppArmadillo wrapper for R based on our Rcpp library. However, the...

Read more »

Logistic Regression & Factors in R

April 24, 2011
By
Logistic Regression & Factors in R

Factors are R's enumerated type. Suppose you define the variable cities -- a vector of strings -- whose possible values are "New York," "Paris," "London" and "Beijing." Instead of representing each city as a string of characters, you might prefer to ...

Read more »

Location Tracking on Android, too!

April 23, 2011
By
Location Tracking on Android, too!

This week it was revealed that the iPhone stores users’ locations, and this immediately caused a huge firestorm of commentary by tech geeks, panic among privacy advocates, and delight to data geeks like myself. Even better/worse, it seems that the iPhone caches location traces long-term, possibly back to the date the phone was activated. I ditched my iPhone this past...

Read more »

Dates in R and the First Day of the Month

April 23, 2011
By
Dates in R and the First Day of the Month

I spent some time this morning learning about how R thinks about dates in R. I found this website to be a useful guide.Imagine that your data are dates in a standard format and you want a vector o...

Read more »

Dates in R and the First Day of the Month

April 23, 2011
By
Dates in R and the First Day of the Month

I spent some time this morning learning about how R thinks about dates in R. I found this website to be a useful guide.Imagine that your data are dates in a standard format and you want a vector o...

Read more »

Measuring association using odds ratios

Measuring association using odds ratios

In my last two posts, I have used the UCI mushroom dataset to illustrate two things.  The first was the use of interestingness measures to characterize categorical variables, and the second was the use of binary confidence intervals...

Read more »