Posts Tagged ‘ Data ’

Anonymising data

August 23, 2011
By

There are only three known jokes about statistics in the whole universe, so to complete the trilogy (see here and here for the other two), listen up: Three statisticians are on a train journey to a conference, and they get chatting to three epidemiologists who are also going to the same place. The epidemiologists are

Merging Two Different Datasets Containing a Common Column With R and R-Studio

August 2, 2011
By

Another way for the database challenged (such as myself!) for merging two datasets that share at least one common column… This recipe using the cross-platform stats analysis package, R. I use R via the R-Studio client, which provides an IDE wrapper around the R environment. So for example, here’s how to merge a couple of

Clustering U.S. Senators using roll call voting data

July 22, 2011
By

For our forthcoming book on machine learning for hackers, John Myles White and I will discuss clustering, and various methods for doing so. One common method for clustering observations

Converting vectors to numeric in mixed-type dataframe

May 19, 2011
By

Coercing variables of character and numeric type into a single dataframe yields all vectors to be defined as factors all <- data.frame(cbind(site, year, model, x, y, z)) The following converts selected variables from “factor” back to “numeric” all\$x <- as.numeric(x) … Continue reading →

Day #38-39 Data-manipulation Part 1

May 10, 2011
By

Last week i created some plots, always for 1 feature. Today I started working on the full script that creates all these plots, 1 per feature. This means, using for loops in R. Let’s see how this is going to work out. Today I mostly worked on data...

Some rediscovered R scripts from spring cleaning

May 1, 2011
By

Gompertz Model Visualization # Gomperz growth function gomp <- function(x, a, b, k) a*exp(-b*exp(-k*x))   # Normal model with Gompertz mean function likelihood <- function(weight, age, sigma, a, b, k) { mu <- gomp(age, a, b, k) dnorm(weight, mu, sigma) }   # Visualize the model visualize <- function(phi=40, theta=-35) { weight <- seq(0, 250,

stalkR: R functions for exploring iPhone and iPad (OS X only)

April 21, 2011
By

Yesterday Alasdair Allan and Pete Warden shocked the world by revealing that iPhones and iPads have been keeping track of our every move, and saving the data in obfuscated back up files. As my friend Vince Buffalo mentioned on Twitter, part of me was disgusted by the secret stalking Steve Jobs was doing, but my

Progress reading SAS sas7bdat files (natively) in R

April 18, 2011
By

This post describes some preliminary results from a compatibility study of the SAS sas7bdat file format. The most current results stored in a github repository here: sas7bdat The ultimate goal is a native solution to the incompatibility between open-source statistical software (e.g. R) and sas7bdat database files. Demonstration There has been significant progress in interpreting

Tumblr Likes

April 11, 2011
By

Look at just the first digit and the number of digits. science: 32914, 11566, 4989, 3743, 968, 814, 673, 482, 286, 2811 black and white: 1694, 1167, 1108, 988, 919, 639, 596, 591, 580, 544 lol: 22627, 18100, 17688, 14374, 13459, 12045, 4711, 3779, 36...