Blog Archives

Solving easy problems the hard way

March 17, 2012
By
Solving easy problems the hard way

There’s a charming little brain teaser that’s going around the Interwebs. It’s got various forms, but they all look something like this: This problem can be solved by pre-school children in 5-10 minutes, by programer – in 1 hour, by people with higher education … well, check it yourself!  8809=6 7111=0 2172=0 6666=4 1111=0 3213=0 7662=2 9313=1 0000=4 2222=0 3333=0 5555=0 8193=3 8096=5 7777=0 9999=4 7756=1 6855=3 9881=5 5531=0 2581=? SPOILER ALERT… The answer has to do with how many

Read more »

Fitting Distribution X to Data From Distribution Y

May 12, 2011
By
Fitting Distribution X to Data From Distribution Y

I had someone ask me about fitting a beta distribution to data drawn from a gamma distribution and how well the distribution would fit. I’m not a “closed form” kinda guy. I’m more of a “numerical simulation” type of fellow. So I whipped up a little R code to illustrate the process then we changed

Read more »

Shell scripting EC2 for fun and profit

May 6, 2011
By
Shell scripting EC2 for fun and profit

Lately I’ve been doing some work with creating ad-hoc clusters of EC2 machines. My ultimate goal is to create a simple way to spin up a cluster of EC2 machines for use with Bryan Lewis’s very cool doRedis backend for the R foreach package. But that’s a whole other post. What I was scratching my

Read more »

Details of two-way sync between two Ubuntu machines

April 18, 2011
By
Details of two-way sync between two Ubuntu machines

In a previous post I discussed my frustrations with trying to get Dropbox or Spideroak to perform BOTH encrypted remote backup and AND fast two way file syncing. This is the detail of how I set up for two machines, both Ubuntu 10.10, to perform two way sync where a file change on either machine

Read more »

Fast Two Way Sync in Ubuntu!

April 9, 2011
By
Fast Two Way Sync in Ubuntu!

I love the portability of a laptop. I have a 45 min train ride twice a day and I fly a little too, so having my work with me on my laptop is very important. But I hate doing long running analytics on my laptop when I’m in the office because it bogs down my

Read more »

Where the heck has JD been?

March 22, 2011
By
Where the heck has JD been?

It’s been pointed out to me that I haven’t had any blog posts in a while. It’s true. I’m fairly slack. But in the last few months I’ve changed jobs (same firm, new role), written an R abstraction on top of Hadoop, been to China, and managed to stay married. While that sounds pretty awesome,

Read more »

Controlling Amazon Web Services using rJava and the AWS Java SDK

November 30, 2010
By
Controlling Amazon Web Services using rJava and the AWS Java SDK

I’ve been messing around with using Amazon Web Services for a while. I’ve had some projects where I wanted to upload files to S3 or fire off EMR jobs. I’ve been controlling AWS services using a hodgepodge of command line tools and the R system() function to call the tools from the command line.

Read more »

Connecting to SQL Server from R using RJDBC

September 22, 2010
By
Connecting to SQL Server from R using RJDBC

A few months ago I switched my laptop from Windows to Ubuntu Linux. I had been connecting to my corporate SQL Server database using RODBC on Windows so I attempted to get ODBC connectivity up and running on Ubuntu. ODBC on Ubuntu turned into an exercise in futility. I spent many hours over many days

Read more »

Principal Component Analysis (PCA) vs Ordinary Least Squares (OLS): A Visual Explanation

September 16, 2010
By
Principal Component Analysis (PCA) vs Ordinary Least Squares (OLS): A Visual Explanation

Over at stats.stackexchange.com recently, a really interesting question was raised about principal component analysis (PCA). The gist was “Thanks to my college class I can do the math, but what does it MEAN?” I felt like this a number of times in my life. Many of my classes were focused on the technical implementations they kinda

Read more »

Third, and Hopefully Final, Post on Correlated Random Normal Generation (Cholesky Edition)

September 2, 2010
By
Third, and Hopefully Final, Post on Correlated Random Normal Generation (Cholesky Edition)

When I did a brief post three days ago I had no plans on writing two more posts on correlated random number generation. But I’ve gotten a couple of emails, a few comments, and some Twitter feedback. In response to my first post, Gappy, calls me out and says, “the way mensches do multivariate (log)normal

Read more »