R Script to retrieve data from Strava

[This article was first published on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I wanted to have a general and standardized data set on which I base my scripts which try to visualize my sporting activities. This data set should be easy to update, and should be called from other scripts. For this I took a two step approach: first retrieve all available data from Strava using the excellent rStrava package which calls the Strava API. Then a small post processing step to select only the data I want to use and add some things like weeknumbers and average speed. The data frame is then stored locally so it can be used as input for other scripts.
It only retrieves averages per activity, not the in-activity data. Perhaps I will add this in the future, if I find time and some use for it. Also, I only use this for retrieval of my own data from stava. I did not look at sites that can do all of this for many users.

Rerieving data from Strava
Post processing

Retrieving data from Strava

You first must tell strava that it should allow access to your data. Head to API part of your Strava account and create the app_name, app_client_id and app_secret. When you have all this setup you will have to create an autentication token in R. this needs to be done only once, after that you can comment out this line. You’ll get a pop-up in Strava to allow access.

I use the following code to open previously created downloads (when present) and look at the last date of the download. Retrieving all data avery time is very time-consuming and not needed at all. Except when you edited, removed or added activities in the past. Or if you just want an update of the amount of kudos ????

library(rStrava)
library(plyr)
setwd("Strava") 

app_name <- "<your app name here>"
app_client_id <- "<app client>"
app_secret <- "<sssh, it's a secret>"

#create the authentication token (only once)
stoken <- httr::config(token = strava_oauth(app_name, app_client_id, app_secret, 
                       app_scope="activity:read_all", cache=TRUE))

#retrieve local token
stoken <- httr::config(token = readRDS('.httr-oauth')[[1]])
filename_raw <- "./data_raw.Rda"
filename_df <- "./data_df.Rda"
if (file.exists(filename_df)) {
   cat("….. download last week")
   load("./data_df.Rda")       
# create empty data frame with same amount of columns as existing data, 
# otherwise column mismatch may occur                          
   df_empty <- df_activities[0,]
                                        
# define last date minus 1 week for corrections 
   last_date <- as.Date(max(df_activities$start_date))-7             

# get new activities and place in data frame   
   new_activities <- get_activity_list(stoken, after = last_date)
   df_new_activities <- compile_activities(new_activities, units="metric")
   df_new_activities <- rbind.fill(df_empty,df_new_activities)

# replace existing records with updated ones, ignore the warnings 
suppressWarnings(df_activities[df_activities$id %in% df_new_activities$id, ] <- df_new_activities)

# combine dataframes
  df_activities <- rbind.fill(df_activities,df_new_activities)    
  df_activities <- unique(df_activities)    
} 
else {
   cat("….. Downloading from 2004, this takes some time")
   last_date <- as.Date("2004-01-01")
   activities <- get_activity_list(stoken, after = last_date)
   df_activities <- compile_activities(activities, units="metric")
 }
# store dataframe
save(df_activities, file="data_df.Rda")

Now that your acitvity data has been downloaded and neatly stored in a data frame data_df.Rda it can be used wherever you want. Just add load("data_df.Rda") in your script and you will start with a dataframe with name you provided when creating it, df_activities in this example.

Post processing

I wanted to change the data frame a bit to better suit my needs. Therefore I added this post-processing part.

#load df_activities
load("data_df.Rda")

#prepare data and create new data frame
Sport<-data.frame("id" = df_activities$id,
                  "when" = strptime(df_activities$start_date_local, format="%Y-%m-%dT%H:%M:%SZ"),
                  "date" = as.Date(df_activities$start_date_local, format="%Y-%m-%d"),
                  "week" = as.numeric(strftime(as.Date(df_activities$start_date_local), format = "%V")),
                  "year" = as.numeric(strftime(df_activities$start_date_local, format="%Y")),
                  "type" = df_activities$type,
                  "dist" = df_activities$distance,              #distance in km
                  "duration" = df_activities$moving_time/3600,  #Moving time in hours
                  "duration_s" = df_activities$moving_time,     #Moving time in seconds
                  "commute" = as.factor(df_activities$commute), #TRUE or FALSE
                  "heart" = df_activities$average_heartrate,
                  "cad" = df_activities$average_cadence,
                  "cal" = df_activities$kilojoules * 239.005736, #1 kilojoule = 239.005736 calories
                  "name" = df_activities$name,
                  "kudos" = df_activities$kudos_count)

#Add some columns
Sport$time <- strftime(Sport$when, format="%H:%M:%S")
Sport <- Sport[c(1:3,10,4:9)]                                    #change order of columns, not really needed 
Sport$month <- as.integer(format(Sport$date, format = "%m"))
Sport$pace <- (Sport$duration_s/60)/Sport$dist                   #pace in min/km
Sport$swimpace <- (Sport$duration_s/60)/(Sport$dist*10)          #pace in min/100 m
Sport$speed <- Sport$dist/Sport$duration

# transform all inf to NA, this prevents errors and frustration in the future
is.na(Sport) <- do.call(cbind,lapply(Sport, is.infinite))

# Store final data fram
save(Sport, file="Sport_df.Rda")

That’s all. This creates the data frame I use for most scripts until now.

The output tail(Sport[,c(6, 7, 20, 8)]) looks like this:

           type    dist     speed   duration
    VirtualRide 32.1013 19.445512 1.65083333
            Run  7.9328 11.446124 0.69305556
           Ride 68.5691 21.581462 3.17722222
           Walk  4.5465  4.990061 0.91111111
    VirtualRide 43.5925 31.550663 1.38166667
            Run 11.0503 12.701494 0.87000000

To leave a comment for the author, please follow the link and comment on their blog: R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)