Blog Archives

Are scatterplots too complex for lay folks?

May 23, 2012
By
Are scatterplots too complex for lay folks?

Usually, I like to write about the solutions to problems I’ve had, but today I only have a problem to write about. This is the second research job I’ve had outside of academia, and in both cases I’ve met with … Continue reading →

Read more »

Bar Graph Colours That Work Well

May 17, 2012
By
Bar Graph Colours That Work Well

Ever since I started using ggplot2 more often at work in order to do graphs, I’ve realized something about the use of colour in bar graphs vs. dot plots: When I’m looking at a graph displayed on the brilliant Viewsonic … Continue reading →

Read more »

Functions ddply and melt make plotting summary stats in R more tolerable

May 15, 2012
By
Functions ddply and melt make plotting summary stats in R more tolerable

The main reason why I have usually chosen to use excel to make my plots at work is because I had difficulty feeding the summary stats in R into a plotting function.  One thing I learned this week is how … Continue reading →

Read more »

An embarrassing admission; Copy pasting tables with text containing spaces from Excel to R

May 11, 2012
By
An embarrassing admission; Copy pasting tables with text containing spaces from Excel to R

I can’t believe I didn’t learn how to do it earlier, but I never knew how to accurately copy tables from excel that had text with spaces in them, and paste into a data frame in R without generating confusion … Continue reading →

Read more »

Memory Management in R, and SOAR

May 8, 2012
By
Memory Management in R, and SOAR

The more I’ve worked with my really large data set, the more cumbersome the work has become to my work computer.  Keep in mind I’ve got a quad core with 8 gigs of RAM.  With growing irritation at how slow … Continue reading →

Read more »

Ack! Duplicates in the Data!

May 3, 2012
By
Ack!  Duplicates in the Data!

As I mentioned in a previous post, I compiled the data set that I’m currently working on in PostgreSQL.  To get this massive data set, I had to write a query that was massive by dint of the number of … Continue reading →

Read more »

Mining for relations between nominal variables

May 1, 2012
By
Mining for relations between nominal variables

The task today was to find what variables had significant relations with an important grouping variable in the big dataset I’ve been working with lately.  The grouping variable has 3 levels, and represents different behaviours of interest.  At first I … Continue reading →

Read more »

Guess who wins: apply() versus for loops in R

April 28, 2012
By
Guess who wins: apply() versus for loops in R

Yesterday I tried to do some data processing on my really big data set in MS Excel. Wow, did it not like handling all those data!! Every time I tried to click on a different ribbon, the screen didn’t even … Continue reading →

Read more »

Projects in RStudio

April 24, 2012
By
Projects in RStudio

Now that I have one enormous project on the go and one smaller one, I find it’s helping me considerably to have each project stored in separate RStudio projects.  So, each project has its own scripting that I’ve been working … Continue reading →

Read more »

PostgreSQL, Excel, R, and a Really Big Data Set!

April 19, 2012
By
PostgreSQL, Excel, R, and a Really Big Data Set!

At work I’ve started to work with the biggest data set I’ve ever seen!  First, let me qualify my use of the term “Big Data”.  The number of rows in the resultant data set (after much transformation and manipulation in … Continue reading →

Read more »