Articles by Andrew Worsley

R Work Areas. Standardize and Automate.

January 10, 2018 | Andrew Worsley

Before beginning work on a new data science project I like to do the following: 1. Get my work area ready by creating an R Project for use with the RStudio IDE. 2. Organize my work area by creating a series of directories to store my project inputs and outputs. I create ‘... [Read more...]

Helpful Data Science Reads

January 2, 2018 | Andrew Worsley

Here are some of the books that I found interesting and useful in 2017. Scrum: The Art of Doing Twice the Work in Half the Time by Jeff Sutherland Jeff Sutherland, one of the creators of the scrum methodology of project management lays down the rational for adopting scrum over more ... [Read more...]

NZ Real GDP htmlwidget

August 5, 2016 | Andrew Worsley

Thought I would try my hand at generating an interactive JavaScript line graph using R. Thankfully the dygraphs package makes this very easy! The code below generates an interactive plot of New Zealand’s real GDP through time. I have added some annotations displaying some of the major financial crises. ...
[Read more...]

Good Parameterisation in R

July 10, 2016 | Andrew Worsley

Imagine you work in a large factory that produces complicated widgets. It is your job to control production line settings which must be reset each day so as to ensure the smooth operation of the factory. However, to change the settings you have to walk around turning dials and pressing ...
[Read more...]

Pretty Data Class Conversion

April 8, 2016 | Andrew Worsley

Load data – check structure – convert – analyse. Data class conversion is essential to gaining the right result… especially if you have left stringsAsFactors = TRUE. The worst thing you can do is feed factor data into a function when you expected it to be characters. If system memory is not a concern, ... [Read more...]

Demystifying the GLM (Part 1)

February 11, 2016 | Andrew Worsley

Upon being thrown a prickly binary classification problem, most data practitioners will have dug deep into their statistical tool box and pulled out the trusty logistic regression model. Essentially, logistic regression can help us predict a binary (yes/no) response with consideration given to other, hopefully related, variables. For example, ... [Read more...]

NZ’s Shifting Makeup

December 17, 2015 | Andrew Worsley

New Zealand is culturally diverse. Even at a regional level, there are big differences in ethnic composition… and with an increasingly inter-connected world, ethnic composition is expected to change substantially in the future, particularly in Auckland. Statistics New Zealand has provided us with sub-national ethnic population projections, by age and ...
[Read more...]

A Matter of Style?

December 3, 2015 | Andrew Worsley

Up until a few weeks ago I would style my code like this: I thought that was the only way… until I witnessed a DBA friend of mine coding. He would write the same function like this: In my opinion, the second style makes the code easier to read. I ... [Read more...]

Trying to Win with R

November 20, 2015 | Andrew Worsley

A common competition run by vendors of fishing equipment is a ‘guess the weight and win’ where an image of someone holding a fish is posted and it is up to you to guess it’s weight with the closest guess winning a prize. The ‘law of large numbers’ implies ...
[Read more...]

Working with Data Frames in Python and R

November 19, 2015 | Andrew Worsley

Originally posted on Data Hipsters: Data frame objects facilitate most data analysis exercises in both R and Python (perhaps with the exception of time series analysis, where the focus is on R time series and Pandas series objects). Data frames are a tidy and meaningful way to store data. This ...
[Read more...]

Text Mining the NZ Road Network with R

October 2, 2015 | Andrew Worsley

What are the most common words in New Zealand road names? Are there any common themes? Thankfully, New Zealand’s 73,906 current road names have been made available through the LINZ Data Service. To answer the questions above, we can use R’s tm package to conduct basic text mining. The ...
[Read more...]

Set Operations in R and Python. Useful!

September 4, 2015 | Andrew Worsley

Set operations are super useful when data cleaning or testing scripts. They are a must have in any analyst’s (data scientist’s/statistician’s/data wizard’s) toolbox. Here is a quick rundown in both R and python. Say we have two vectors x and y… What if we ‘...
[Read more...]

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)