Blog Archives

Functions ddply and melt make plotting summary stats in R more tolerable

May 15, 2012
By
Functions ddply and melt make plotting summary stats in R more tolerable

The main reason why I have usually chosen to use excel to make my plots at work is because I had difficulty feeding the summary stats in R into a plotting function.  One thing I learned this week is how … Continue reading →

Read more »

An embarrassing admission; Copy pasting tables with text containing spaces from Excel to R

May 11, 2012
By
An embarrassing admission; Copy pasting tables with text containing spaces from Excel to R

I can’t believe I didn’t learn how to do it earlier, but I never knew how to accurately copy tables from excel that had text with spaces in them, and paste into a data frame in R without generating confusion … Continue reading →

Read more »

Memory Management in R, and SOAR

May 8, 2012
By
Memory Management in R, and SOAR

The more I’ve worked with my really large data set, the more cumbersome the work has become to my work computer.  Keep in mind I’ve got a quad core with 8 gigs of RAM.  With growing irritation at how slow … Continue reading →

Read more »

Ack! Duplicates in the Data!

May 3, 2012
By
Ack!  Duplicates in the Data!

As I mentioned in a previous post, I compiled the data set that I’m currently working on in PostgreSQL.  To get this massive data set, I had to write a query that was massive by dint of the number of … Continue reading →

Read more »

Mining for relations between nominal variables

May 1, 2012
By
Mining for relations between nominal variables

The task today was to find what variables had significant relations with an important grouping variable in the big dataset I’ve been working with lately.  The grouping variable has 3 levels, and represents different behaviours of interest.  At first I … Continue reading →

Read more »

Guess who wins: apply() versus for loops in R

April 28, 2012
By
Guess who wins: apply() versus for loops in R

Yesterday I tried to do some data processing on my really big data set in MS Excel. Wow, did it not like handling all those data!! Every time I tried to click on a different ribbon, the screen didn’t even … Continue reading →

Read more »

Projects in RStudio

April 24, 2012
By
Projects in RStudio

Now that I have one enormous project on the go and one smaller one, I find it’s helping me considerably to have each project stored in separate RStudio projects.  So, each project has its own scripting that I’ve been working … Continue reading →

Read more »

PostgreSQL, Excel, R, and a Really Big Data Set!

April 19, 2012
By
PostgreSQL, Excel, R, and a Really Big Data Set!

At work I’ve started to work with the biggest data set I’ve ever seen!  First, let me qualify my use of the term “Big Data”.  The number of rows in the resultant data set (after much transformation and manipulation in … Continue reading →

Read more »

Fun Editing R Graphs in Inkscape

April 12, 2012
By
Fun Editing R Graphs in Inkscape

Last week, I read a chapter out of Visualize This by Nathan Yau.  I was, of course, delighted to see that he was championing the use of R.  One really cool thing that I learned from his book, and was very … Continue reading →

Read more »

Sampling and the Analysis of Big Data

April 8, 2012
By
Sampling and the Analysis of Big Data

After my last post, I came across a few articles supporting the opinion that if you have a good reason to take random samples from a “big” dataset, you’re not committing some kind of sin: Big Data Blasphemy: Why Sample? … Continue reading →

Read more »