Blog Archives

Ack! Duplicates in the Data!

May 3, 2012
By
Ack!  Duplicates in the Data!

As I mentioned in a previous post, I compiled the data set that I’m currently working on in PostgreSQL.  To get this massive data set, I had to write a query that was massive by dint of the number of … Continue reading →

Read more »

Mining for relations between nominal variables

May 1, 2012
By
Mining for relations between nominal variables

The task today was to find what variables had significant relations with an important grouping variable in the big dataset I’ve been working with lately.  The grouping variable has 3 levels, and represents different behaviours of interest.  At first I … Continue reading →

Read more »

Guess who wins: apply() versus for loops in R

April 28, 2012
By
Guess who wins: apply() versus for loops in R

Yesterday I tried to do some data processing on my really big data set in MS Excel. Wow, did it not like handling all those data!! Every time I tried to click on a different ribbon, the screen didn’t even … Continue reading →

Read more »

Projects in RStudio

April 24, 2012
By
Projects in RStudio

Now that I have one enormous project on the go and one smaller one, I find it’s helping me considerably to have each project stored in separate RStudio projects.  So, each project has its own scripting that I’ve been working … Continue reading →

Read more »

PostgreSQL, Excel, R, and a Really Big Data Set!

April 19, 2012
By
PostgreSQL, Excel, R, and a Really Big Data Set!

At work I’ve started to work with the biggest data set I’ve ever seen!  First, let me qualify my use of the term “Big Data”.  The number of rows in the resultant data set (after much transformation and manipulation in … Continue reading →

Read more »

Fun Editing R Graphs in Inkscape

April 12, 2012
By
Fun Editing R Graphs in Inkscape

Last week, I read a chapter out of Visualize This by Nathan Yau.  I was, of course, delighted to see that he was championing the use of R.  One really cool thing that I learned from his book, and was very … Continue reading →

Read more »

Sampling and the Analysis of Big Data

April 8, 2012
By
Sampling and the Analysis of Big Data

After my last post, I came across a few articles supporting the opinion that if you have a good reason to take random samples from a “big” dataset, you’re not committing some kind of sin: Big Data Blasphemy: Why Sample? … Continue reading →

Read more »