R Script to retrieve data from Strava
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I wanted to have a general and standardized data set on which I base my scripts which try to visualize my sporting activities. This data set should be easy to update, and should be called from other scripts. For this I took a two step approach: first retrieve all available data from Strava using the excellent rStrava package which calls the Strava API. Then a small post processing step to select only the data I want to use and add some things like weeknumbers and average speed. The data frame is then stored locally so it can be used as input for other scripts.
It only retrieves averages per activity, not the in-activity data. Perhaps I will add this in the future, if I find time and some use for it. Also, I only use this for retrieval of my own data from stava. I did not look at sites that can do all of this for many users.
Rerieving data from Strava
Post processing
Retrieving data from Strava
You first must tell strava that it should allow access to your data. Head to API part of your Strava account and create the app_name
, app_client_id
and app_secret
. When you have all this setup you will have to create an autentication token in R. this needs to be done only once, after that you can comment out this line. You’ll get a pop-up in Strava to allow access.
I use the following code to open previously created downloads (when present) and look at the last date of the download. Retrieving all data avery time is very time-consuming and not needed at all. Except when you edited, removed or added activities in the past. Or if you just want an update of the amount of kudos
library(rStrava) library(plyr) setwd("Strava") app_name <- "<your app name here>" app_client_id <- "<app client>" app_secret <- "<sssh, it's a secret>" #create the authentication token (only once) stoken <- httr::config(token = strava_oauth(app_name, app_client_id, app_secret, app_scope="activity:read_all", cache=TRUE)) #retrieve local token stoken <- httr::config(token = readRDS('.httr-oauth')[[1]]) filename_raw <- "./data_raw.Rda" filename_df <- "./data_df.Rda" if (file.exists(filename_df)) { cat("….. download last week") load("./data_df.Rda") # create empty data frame with same amount of columns as existing data, # otherwise column mismatch may occur df_empty <- df_activities[0,] # define last date minus 1 week for corrections last_date <- as.Date(max(df_activities$start_date))-7 # get new activities and place in data frame new_activities <- get_activity_list(stoken, after = last_date) df_new_activities <- compile_activities(new_activities, units="metric") df_new_activities <- rbind.fill(df_empty,df_new_activities) # replace existing records with updated ones, ignore the warnings suppressWarnings(df_activities[df_activities$id %in% df_new_activities$id, ] <- df_new_activities) # combine dataframes df_activities <- rbind.fill(df_activities,df_new_activities) df_activities <- unique(df_activities) } else { cat("….. Downloading from 2004, this takes some time") last_date <- as.Date("2004-01-01") activities <- get_activity_list(stoken, after = last_date) df_activities <- compile_activities(activities, units="metric") } # store dataframe save(df_activities, file="data_df.Rda")
Now that your acitvity data has been downloaded and neatly stored in a data frame data_df.Rda
it can be used wherever you want. Just add load("data_df.Rda")
in your script and you will start with a dataframe with name you provided when creating it, df_activities
in this example.
Post processing
I wanted to change the data frame a bit to better suit my needs. Therefore I added this post-processing part.
#load df_activities load("data_df.Rda") #prepare data and create new data frame Sport<-data.frame("id" = df_activities$id, "when" = strptime(df_activities$start_date_local, format="%Y-%m-%dT%H:%M:%SZ"), "date" = as.Date(df_activities$start_date_local, format="%Y-%m-%d"), "week" = as.numeric(strftime(as.Date(df_activities$start_date_local), format = "%V")), "year" = as.numeric(strftime(df_activities$start_date_local, format="%Y")), "type" = df_activities$type, "dist" = df_activities$distance, #distance in km "duration" = df_activities$moving_time/3600, #Moving time in hours "duration_s" = df_activities$moving_time, #Moving time in seconds "commute" = as.factor(df_activities$commute), #TRUE or FALSE "heart" = df_activities$average_heartrate, "cad" = df_activities$average_cadence, "cal" = df_activities$kilojoules * 239.005736, #1 kilojoule = 239.005736 calories "name" = df_activities$name, "kudos" = df_activities$kudos_count) #Add some columns Sport$time <- strftime(Sport$when, format="%H:%M:%S") Sport <- Sport[c(1:3,10,4:9)] #change order of columns, not really needed Sport$month <- as.integer(format(Sport$date, format = "%m")) Sport$pace <- (Sport$duration_s/60)/Sport$dist #pace in min/km Sport$swimpace <- (Sport$duration_s/60)/(Sport$dist*10) #pace in min/100 m Sport$speed <- Sport$dist/Sport$duration # transform all inf to NA, this prevents errors and frustration in the future is.na(Sport) <- do.call(cbind,lapply(Sport, is.infinite)) # Store final data fram save(Sport, file="Sport_df.Rda")
That’s all. This creates the data frame I use for most scripts until now.
The output tail(Sport[,c(6, 7, 20, 8)])
looks like this:
type dist speed duration VirtualRide 32.1013 19.445512 1.65083333 Run 7.9328 11.446124 0.69305556 Ride 68.5691 21.581462 3.17722222 Walk 4.5465 4.990061 0.91111111 VirtualRide 43.5925 31.550663 1.38166667 Run 11.0503 12.701494 0.87000000
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.