What’s all this then?
Hi. My name is Seth, and this is my new blog. I plan to use this blog to noodle around with the statistical programming language R, hopefully in fun and interesting ways, to continue to hone my programming and data analysis skills.
I’ve been thinking about creating an R blog for some time, but the task always seemed a bit daunting. Turns out, it was much easier than I thought, especially with the help of the excellent blogdown book and RStudio’s built-in blogdown package support. Creating a new blog is as easy as File -> New Project -> New Directory -> Website using blogdown once blogdown is installed.
This blog uses the Tuftesque Hugo theme created by Nick Strayer. One of the cool things about the Tufte theme, which Tuftesque is based on, is the ability to create these nifty side notes.Actually, if you like this theme, the tutorial for it is basically all you need to get started with blogdown (plus a free Netlify account for deployment).
So my advice to my fellow R users, if you’ve always wanted to create an R blog but you’ve been hesitant to take the plunge, just dive in!
R and me
I’ve been an R user for about 5 years now. Prior to learning R, I was an SPSS user with very little coding experience. What I did have was a decent background in data analysis and applied statistics (mostly regression) thanks to my graduate studies and academic research.
On a tagential note, here is a quick look at how Google search trends for R and SPSS have changed over time in the US since 2004. As you can see in Figure 1, search interest in R has significantly increased over the last 15 years while interest in SPSS has declined.
# Import data df <- readr::read_csv("R_vs_SPSS_google_trends.csv", skip = 2) # Load packages and set plot theme library(tidyverse) library(lubridate) library(ggthemr) ggthemr("dust") # Tidy up dataset and create plot df %>% mutate(month = ymd(Month, truncated = 1)) %>% rename(R = `R: (United States)`, SPSS = `SPSS: (United States)`) %>% gather(R, SPSS, key = "Tool", value = "rel_interest") %>% ggplot(aes(x = month, y = rel_interest, color = Tool)) + geom_line(lwd = 1.25) + labs(title = "Google Search Trends for R and SPSS", subtitle = "2004 to Present", x = "Month", y = "Relative Interest", caption = "sethdobson.netlify.com")
A few things to note about the code used to create this plot:
- The data were downloaded as a CSV file from Google Trends. When reading the file, it is necessary to skip the first 2 lines because the column headings actually start on line 3.
Monthcolumn in the original file is formatted as year-month, for example “2018-01” for January, 2018. To get R to understand that this is a date, I had to use
truncated = 1.
- In a tidy data view of the world, the renamed columns
SPSSare actually values of a variable, which I call
Tool. So I needed to use
dplyr::gatherto reformat the table to truly make it tidy.
Anyways, tangent over. After I decided to leave academia, I began teaching myself R in my spare time.I mostly learned R by blogging about Scottish football data as a hobby here, and eventually helped start the website modernfitba.com. My ability to use R helped me land my first full-time analytics job outside of academia, and I’ve never looked back. Today I use R every day in my work, mostly to build predictive models.
Although I am adept at R, I do not consider myself an expert. I would say I’m somewhere between a beginner and intermediate R user. The main reason is that I generally do not write my own functions, and I’ve certainly never written a package. But I also wonder whether the idea of a continuum from R user to R programmer is useful. I suspect that most R users can do what they need to do in R using the vast array of published functions and packages.
It’s easy to develop an inferiority complex as an R user looking at all the amazing things that others are doing with R, especially the hardcore R programmers. I’m pretty much over that though. So you won’t find any hardcore R programming on this blog, just simple stuff (relatively speaking).
The name Artful Analtyics has mutiple meanings for me. First, there is an art to data analysis, although it’s mostly a science. The much of the art comes in after the analysis is done, when it’s time to tell a story. Data do not speak for themselves, they must be woven together in a coherent and engaging fashion to have an impact, especially in the business world. I can’t say that I always succeed at this, but I try. I believe blogging helps develop this skill.
There’s certainly an art to data visualization as well. This is one of the areas of analytics that interest me most. Most of my blogposts will revolve around explaining charts or other visual displays of information.
The word Artful also has a personal meaning. It was my dad’s nickname when he was a truck driver, Don’t ask me how he ended up with such a literary nickname. I never knew him to be a Dickens man.the Artful Dodger, or Dodge for short. I guess I could have gone with Dodger Analytics instead, but I wouldn’t want anyone thinking this was a sabermertrics blog.
So that’s it for now. I just wanted to give some background about myself and why I started this blog. Thanks for reading and stay tuned.