Assign n Email Addresses to x Cells, Intrinsically (Part II)

March 27, 2014
Part I showed the concept and general technique of a method of assigning n email addresses to x cells pseudo-randomly, without the need for maintaining a log of each assignment.The earlier post considered the basic case of each cell being assigned approximately the same quantity of email addresses. In practice, cell sizes often vary. Below is a technique that...

March 23, 2014
You can use this tutorial in the ThinkToStartR package with: ThinkToStart(“SentimentCloud”,”KEYWORD”,# of tweets,”DATUMBOX API KEY”) Hey everybody, some days ago I created a wordcloud filled with tweets of a recent german news topic. And a lot of people asked me if I have some code how I created this cloud. And so here it is. …

Seamless analytical environment by WDI, dplyr, and rMaps

March 22, 2014
Recently I found that My R Guru @ramnath_vaidya is developping a new visualization package rMaps. I was so excited when I saw it for the first time and I think that it's really awesome for plotting any data on a map....

Frequentist German Tank Problem

March 20, 2014
The German Tank Problem: The Frequentist Way Many things are given a serial number and often that serial number, logically, starts at 1 and for each new unit is increased by 1. For example, German tanks in World War II had several parts with serial numbers. By collecting...

Why multiple imputation?

March 20, 2014
Background In the forth coming week, I will be giving a presentation on the fundamentals of imputation to my colleagues. One of the most important idea I would like to present is multiple imputation. In my last post, I have...

Stop using bivariate correlations for variable selection

March 19, 2014
Stop using bivariate correlations for variable selection Something I've never understood is the widespread calculation and reporting of univariate and bivariate statistics in applied work, especially when it comes to model selection. Bivariate statistics are, at best, useless for multi-variate model selection and, at worst, harmful. Since nearly all...

Use Data Science to help CARE International in Atlanta, March 28

March 17, 2014
CARE International is a humanitarian organization that is leading the charge to fight poverty around the world, with a focus on empowering women and girls. On March 28 in Atlanta, CARE is teaming up with Booz Allen Hamilton, Emory University's Rollins School of Public Health and Revolution Analytics to bring data scientists together to use R to explore some...

How to use Bioconductor to find empirical evidence in support of π being a normal number

March 14, 2014
Happy π day everybody! I wanted to write some simple code (included below) to the test parallelization capabilities of my  new cluster. So, in honor of  π day, I decided to check for evidence that π is a normal number. A … Continue reading →