Monthly Archives: November 2012

Wikipedia Attention and the US elections

November 3, 2012
By
Wikipedia Attention and the US elections

One of the most interesting challenges of data science are predictions for important events such as national elections. With all those data streams of billions of posts, comments, likes, clicks etc. there should be a way to identify the most important correlations to make predictions about real-world behavior such as: going to the voting booth

Read more »

Generation of a normal distribution from "scratch" – The box-muller method

November 3, 2012
By
Generation of a normal distribution from "scratch" – The box-muller method

My previous post is about a method to simulate a Brownian motion. A friend of mine emailed me yesterday to tell me that this is useless if we do not know how to simulate a normally distributed variable. My first remark is: use the rnorm() function if t...

Read more »

Reordering factor levels in R plots

November 3, 2012
By
Reordering factor levels in R plots

A few days ago a post doctoral researcher asked me if I could help him reorder the factor levels on a bar chart. The problem is that R automatically alphabetizes factor levels. I thought this would be fairly straight-forward but...

Read more »

Project Euler — problem 21

November 3, 2012
By
Project Euler — problem 21

It’s been over one month since my last post on Euler problem 20, when  I was planning to post at least one on either Euler project or visualization. So I am four posts behind; I’ll try to catch up. Tonight, I’ll solve the 21st Euler … Continue reading →

Read more »

SAP HANA and R (The way of the widget)

November 3, 2012
By
SAP HANA and R (The way of the widget)

A real developer never stops learning that's a quote I always love to repeat...because it applies to my life...you can know a lot of things but there's always something new to learn, or to re-learn. That's why a couple of days ago I start reading wxPyt...

Read more »

Breakthroughs in the sas7bdat Reverse Engineering Effort

November 3, 2012
By

Due largely to the work of Clint Cummins, the sas7bdat file format has become a bit less shrouded. In particular, we now know the following: how to detect files with compressed data (and fail graciously) more details about the platform that generated the file (e.g., endianess, OS details) how to read files that were generated

Read more »

Using R to Compare Hurricane Sandy and Hurricane Irene

November 3, 2012
By
Using R to Compare Hurricane Sandy and Hurricane Irene

Having just lived through two back to back hurricanes (Irene in 2011 and Sandy in 2012) that passed through the New York metro area I was curious how the paths of the hurricanes differed.  I worked up a quick graph in R using data from Unisys.  The data also includes wind speed and barometric pressure.

Read more »

Unstable parallel simulation, or after finishing testing, test some more

November 2, 2012
By

Lately I have been working on a trading system based on Support Vector Machine (SVM) regression (and yes, if you wonder, there are a few posts planned to share the results). In this post however I want to share an interesting problem I had to deal with. Few days ago, I started running simulations using

Read more »

Simple Bayesian bootstrap

November 2, 2012
By

Bootstrapping is a very popular statistical technique. However, its Bayesian analogue proposed by Rubin (1981) is not very common. I was looking for an example of its implementation in GNU R and could not find one so I decided to write a snippet presen...

Read more »

Which functions in plyr do people use?

November 2, 2012
By
Which functions in plyr do people use?

This is the question that Hadley Wickham recently set out to discovering by asking frequent R and plyr users how they use it in an online survey. Once a decent number of people have responded, Hadley quickly went forward and produced a short analysis of the plyr usage survey, and published it in RPubs.  With his permission, I am...

Read more »