## Failed Randomization In A Randomized Trial?

November 4, 2013
$Failed Randomization In A Randomized Trial?$

We will continue the saga of the three-arm clinical trial that is giving the editors of the prestigious journal The Spleen a run for their money. While the polls are gathering digital dust, let’s see if we can direct this discussion to a more quantitative path. To do so, we will ask (and answer) the

## Enron Email Corpus Topic Model Analysis Part 2 – This Time with Better regex

November 4, 2013
After posting my analysis of the Enron email corpus, I realized that the regex patterns I set up to capture and filter out the cautionary/privacy messages at the bottoms of peoples emails were not working.  Let’s have a look at … Continue reading →

## Commissions

November 4, 2013
Today, I want to explain the commission’s functionality build in to Systematic Investor Toolbox(SIT) “share” back-test. At each re-balance time the capital is allocated given the weight such that For example, if weight is 100% (i.e. fully invested) and capital = \$100 and price = \$10 then The period return is equal to The total

## Archival and analysis of #GI2013 Tweets

November 4, 2013
I archived and analyzed all Tweets containing #GI2013 from the recent Cold Spring Harbor Genome Informatics meeting, using my previously described code.Friday was the most Tweeted day. Perhaps this was due to Lior Pachter's excellent keynote, "Stories ...

## Generating d3js Motion Charts from rCharts

November 4, 2013
Remember Gapminder, the animated motion chart popularised by Hans Rosling in his TED Talks and Joy of Stats TV programme? Well it’s back on TV this week in Don’t Panic – The Truth About Population, a compelling piece of OU/BBC co-produced stats theatre featuring Hans Rosling, and a Pepper’s Ghost illusion brought into the digital

## Spatial Clustering With Equal Sizes

November 4, 2013
This is a problem I have encountered many times where the goal is to take a sample of spatial locations and apply constraints to the algorithm.  In addition to providing a pre-determined number of K clusters a fixed size of elements needs to be held constant within each cluster. An application of this algorithm is

## Species occurrence data to CartoDB

November 4, 2013
We have previously written about creating interactive maps on the web from R, with the interactive maps on Github. See here, here, here, and here. A different approach is to use CartoDB, a freemium service with sql interface to your data tables that provides a map to visualize data in those tables. They released...

## analyze the american national election studies (anes) with r

November 4, 2013
on election days in the united states, the news media peppers its coverage with quick, dirty exit polls that allow them to make coarse statements like, "x% of demographic group y voted for candidate z."  the american national election studies are ...

## A Rather Nosy Topic Model Analysis of the Enron Email Corpus

November 3, 2013
Having only ever played with Latent Dirichlet Allocation using gensim in python, I was very interested to see a nice example of this kind of topic modelling in R.  Whenever I see a really cool analysis done, I get the … Continue reading →