Posts Tagged ‘ Data ’

Anonymising data

August 23, 2011
By
Anonymising data

There are only three known jokes about statistics in the whole universe, so to complete the trilogy (see here and here for the other two), listen up: Three statisticians are on a train journey to a conference, and they get chatting to three epidemiologists who are also going to the same place. The epidemiologists are

Read more »

Merging Two Different Datasets Containing a Common Column With R and R-Studio

August 2, 2011
By
Merging Two Different Datasets Containing a Common Column With R and R-Studio

Another way for the database challenged (such as myself!) for merging two datasets that share at least one common column… This recipe using the cross-platform stats analysis package, R. I use R via the R-Studio client, which provides an IDE wrapper around the R environment. So for example, here’s how to merge a couple of

Read more »

Clustering U.S. Senators using roll call voting data

July 22, 2011
By
Clustering U.S. Senators  using roll call voting data

For our forthcoming book on machine learning for hackers, John Myles White and I will discuss clustering, and various methods for doing so. One common method for clustering observations

Read more »

Converting vectors to numeric in mixed-type dataframe

May 19, 2011
By

Coercing variables of character and numeric type into a single dataframe yields all vectors to be defined as factors all <- data.frame(cbind(site, year, model, x, y, z)) The following converts selected variables from “factor” back to “numeric” all$x <- as.numeric(x) … Continue reading →

Read more »

Day #38-39 Data-manipulation Part 1

May 10, 2011
By

Last week i created some plots, always for 1 feature. Today I started working on the full script that creates all these plots, 1 per feature. This means, using for loops in R. Let’s see how this is going to work out. Today I mostly worked on data...

Read more »

Some rediscovered R scripts from spring cleaning

May 1, 2011
By
Some rediscovered R scripts from spring cleaning

Gompertz Model Visualization # Gomperz growth function gomp <- function(x, a, b, k) a*exp(-b*exp(-k*x))   # Normal model with Gompertz mean function likelihood <- function(weight, age, sigma, a, b, k) { mu <- gomp(age, a, b, k) dnorm(weight, mu, sigma) }   # Visualize the model visualize <- function(phi=40, theta=-35) { weight <- seq(0, 250,

Read more »

stalkR: R functions for exploring iPhone and iPad (OS X only)

April 21, 2011
By
stalkR: R functions for exploring iPhone and iPad (OS X only)

Yesterday Alasdair Allan and Pete Warden shocked the world by revealing that iPhones and iPads have been keeping track of our every move, and saving the data in obfuscated back up files. As my friend Vince Buffalo mentioned on Twitter, part of me was disgusted by the secret stalking Steve Jobs was doing, but my

Read more »

Progress reading SAS sas7bdat files (natively) in R

April 18, 2011
By

This post describes some preliminary results from a compatibility study of the SAS sas7bdat file format. The most current results stored in a github repository here: sas7bdat The ultimate goal is a native solution to the incompatibility between open-source statistical software (e.g. R) and sas7bdat database files. Demonstration There has been significant progress in interpreting

Read more »

Tumblr Likes

April 11, 2011
By
Tumblr Likes

Look at just the first digit and the number of digits. science: 32914, 11566, 4989, 3743, 968, 814, 673, 482, 286, 2811 black and white: 1694, 1167, 1108, 988, 919, 639, 596, 591, 580, 544 lol: 22627, 18100, 17688, 14374, 13459, 12045, 4711, 3779, 36...

Read more »