Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# TL;DR

I recently defended my PhD thesis, and it was stressful and a lot of work (though the day itself was brilliant!). In order to take advantage of my putting myself through such a gruelling experience, I bought a heart rate and sleep tracker watch to measure just how bad things would get. While my subjective experience of the stress and emotional despair wasn’t as acute as I thought it might be, my physiological data still tell quite the story…

# Background

I recently successfully defended my PhD thesis, which you can download here. I knew that I could expect the months preceding it to be a gruelling process, and I have colleagues who have described difficulty sleeping, eating, and breaking down in tears unexpectedly. I decided that I might be able to turn any physiological and emotional hardship into something more fun, and I bought an activity/sleep stage/heart rate tracker. I figured that the Withings Steel HR seemed a pretty good match, since all of the data can be exported, it measured all the things that I wanted it to, and it looks very presentable too.

Speaking of: if any companies want to send me free gadgets to play with for fun side projects, I am all ears. Just saying…

# Getting Data

There are two ways for extracting data from Withings devices:

1. By exporting all of one’s data as CSV files
2. Through the API

The first option is a bit annoying because it means having to go through the whole process, waiting for an email, and then downloading everything as a big zip file, and then extracting the right CSV file from in there. And in each CSV file, there are numbers, sometimes arranged quite confusingly, and with no data dictionary. The second option means getting much better-organised data, but it required figuring out how to access the API, as well as not providing the heart rate data. So I’d have to do a bit of both.

Thankfully, I had a PhD thesis to procrastinate from writing, so I put together the rwithings R package, which contains functions for querying the Withings API (with OAuth 2.0, as opposed to the OAuth 1.0 authentification included in the httr demo), as well as for wrestling out the data which isn’t in the API from the CSV files.

In this analysis, I wanted to look at my sleep as well as my heart rate in the weeks preceding my thesis defence. This requires the use of both the API and the CSV files.

# Analysis

## Sleep Depth

First I wanted to look at how deeply I slept in the weeks preceding the defence. The higher the proportion of deep sleep, the more restful one’s sleep is supposed to be.

### Extracting the data

# remotes::install_github("mathesong/rwithings")

library(tidyverse)
library(rwithings)

Now that we have everything loaded, we can begin. First, we must authenticate ourselves and get a token.

client_id <- MY_CLIENT_ID            # Secret!
client_secret <- MY_CLIENT_SECRET    # Secret!

token <- withings_auth(client_id, client_secret)

And now we can use that to download some data using the rwithings package.

sleepdat <- getsleepsummary(token, startdate = "2018-10-01",
enddate = "2018-12-10")

### Cleaning and Analysis

And now we can pull out the bits that we want: how long I spent awake and in light and deep sleep. I also found a nifty function in the chron package to say whether a day was a weekend day or not in case that was a factor.

sleepsummary <- sleepdat$body$series %>%
select(date, ends_with("duration")) %>%
rename(Date=date,
Awake = data.wakeupduration,
LightSleep = data.lightsleepduration,
DeepSleep = data.deepsleepduration) %>%
mutate(Date=as.Date(Date),
Weekend = chron::is.weekend(Date))

I noticed some dates duplicated in this data, and this corresponded with naps.

sleepsummary %>%
filter(Date == "2018-11-11")
##         Date Awake LightSleep DeepSleep Weekend
## 1 2018-11-11  1680      12960     14040    TRUE
## 2 2018-11-11   300       5460      6420    TRUE

I solved this by removing the second duplicate of each day’s sleep.

sleepsummary <- sleepsummary[!duplicated(sleepsummary$Date, fromLast = F),] sleepsummary %>% filter(Date == "2018-11-11") ## Date Awake LightSleep DeepSleep Weekend ## 1 2018-11-11 1680 12960 14040 TRUE I wanted to look at proportional sleep in each of the different states, so I looked at proportions of the total sleep duration for each night. sleepprops <- sleepsummary %>% mutate(TotalSleep = Awake + LightSleep + DeepSleep) %>% gather(SleepType, Minutes, -Date, -TotalSleep, -Weekend) %>% mutate(SleepProportion = Minutes/TotalSleep, SleepType = factor(SleepType, levels = c("Awake", "LightSleep", "DeepSleep"))) ### Plotting I decided to use this oppotunity to play with the hrbrthemes themes that I’ve seen so much talk of, which just look gorgeous! I also picked some colours using the Color Calculator. # remotes::install_github("hrbrmstr/hrbrthemes") library(hrbrthemes) sleep_proportions <- ggplot(sleepprops, aes(x=Date, y=SleepProportion, colour=SleepType)) + geom_point(aes(shape=Weekend), size=3) + geom_line() + geom_vline(xintercept = as.Date("2018-11-30"), linetype="dotted", size=1) + geom_smooth(data=filter(sleepprops, Date<=as.Date("2018-11-30"))) + facet_grid(SleepType ~ ., scales = "free") + scale_shape_manual(values=c(19, 1)) + theme_ipsum_rc() + labs(x="Date", y="Sleep Stage Proportions", title="My Sleeping Patterns Before my PhD Defence", subtitle="Nightly proportional sleep stages in the two months prior", caption="Recordings were made using a Withings Steel HR") + scale_color_manual(values=c("#ffd631", "#31d6ff", "#d12869")) sleep_proportions ### Conclusions So it seems that the proportion of hours spent awake or in light sleep were increasing, while my deep sleep was decreasing over the course of the months preceding my defence. This is exactly what one might have predicted! I must admit that I didn’t expect this to actually come out, as my two-year-old daughter decided to start waking up in the middle of the night from time to time in the weeks preceding the defence, but this didn’t get any better or worse in the last days preceding the defence itself, so I don’t think it explains the change, but might contribute to the trend being a little noisier. ## Heart Rate Data Next I wanted to look at trends in the heart rate data, both during the defence, as well as in the days prior. ### Extracting the data I first extracted the data as a CSV, and read it into R using the rwithings package function for munging this particular data. hr <- read_csv_startdurval("../../static/data/20181213_ThesisDefence/raw_tracker_hr.csv") %>% rename(Start = start, Duration = duration, Freq = value, Date = date, End = end, Mid = mid) ## Date in ISO8601 format; converting timezone from UTC to "Europe/Berlin". ### During the defence During the defence, I set my watch to “workout mode”, which means that it records heart rate data at a higher frequency. For this analysis, I thought a good strategy would be to calculate the mean heart rate for each minute, weighted by the duration of the measurement interval. What this means is that I first want to limit the data set to the time of the defence, and then calculate minute means. #### Cleaning and Analysis I was a bit lazy about calculating minute averages: the hard way would be to break up each measurement that went over a minute boundary, but I decided to just take weighted means of measurements whose middle points were within each minute. Since it was workout mode, each measurement is only a few seconds long in any case. library(lubridate) tz <- Sys.timezone() dday <- hr %>% filter(Date=="2018-11-30") %>% filter(Start > as.POSIXct("2018-11-30 09:15:00", tz = tz)) %>% filter(Start < as.POSIXct("2018-11-30 14:15:00", tz = tz)) %>% mutate(Minute = floor_date(Mid, unit="minute")) minsummary <- dday %>% group_by(Minute) %>% summarise(MinuteFreq = weighted.mean(Freq, w=Duration)) dday <- left_join(dday, minsummary) #### Labels For labels, I used the time signatures on photographs that my mum took during the defence, as well as by carefully examining the raw steps data to see when I was moving about or sitting. The final label set is below (thanks to the datapasta package). labels <- tibble( Time = c("2018-11-30 09:33:00", "2018-11-30 09:37:00", "2018-11-30 09:58:00", "2018-11-30 10:43:00", "2018-11-30 12:03:00", "2018-11-30 12:05:00", "2018-11-30 12:14:00", "2018-11-30 12:48:00", "2018-11-30 13:17:00", "2018-11-30 13:41:00", "2018-11-30 13:56:00", "2018-11-30 13:58:00"), Event = c("Chairperson introduces\nthe defence", "Opponent's presentation\nbegins", "My presentation\nbegins", "Opponent's questions\nbegin", "Opponent's questions\nend", "Short break", "Committee questions\nbegin", "Committee questions\nend", "Walk to\nreception\n(15 min)", "Awaiting\ndecision", "Committee\nreturns", "I passed!")) %>% mutate(Freq = 120, # This is just for the annotation position MinuteFreq = Freq, Time = as.POSIXct(Time), Minute = Time)  #### Plotting And now let’s put it all together! For the labels, I use ggrepel. library(ggrepel) dday_minuteplot <- ggplot(dday, aes(x=Minute, y=MinuteFreq)) + geom_segment(data=labels, aes(x=Minute, xend=Minute, y=120, yend=25), size=0.4) + geom_segment(aes(x=Minute, xend=Minute, y=MinuteFreq, yend=25, colour=MinuteFreq), size=0.8) + scale_colour_viridis_c("Frequency\n(bpm)", option = "A") + scale_x_datetime(breaks = "1 hour", date_labels = "%I:%M %p") + theme_ipsum_rc() + labs(x="Time", y="Heart Rate (bpm)", title="Heart rate during my PhD Defence", subtitle="Weighted mean frequencies recorded for each minute", caption="Recordings were made using a Withings Steel HR") + geom_point(data=labels) + geom_text_repel( angle = 30, segment.size = 0.2, ylim = c(125,NA), data=labels, aes(label=Event), size=3 ) + ylim(25, 140) dday_minuteplot #### Conclusions I was mostly struck by the breaks! Going down to a bpm of 50 and lower during the defence is so unexpected, since this is definitely below my usual resting heart rate, and especially during such a stressful experience! The walk to the reception is understandable, since it was a couple of kilometres. The missing data during my presentation also makes sense, since I was up and gesticulating wildly. By the way, this figure had to be very big to get all the labels as well as the minute lines, and works great on mobile since one can zoom in. But in a desktop browser, the best option is to just open it in a new tab by right-clicking it and selecting View Image. ### Heart Rate Distributions Next, I wanted to look at the daily distributions of my heart rate in the days before and after. There are several complications to this: • Waking and sleeping heart rates are quite different • Heart rates are typically right-skewed distributions, so a mean doesn’t cut it, and the median depends on the amount of time spent exercising etc • The watch does not record heart rate at a constant sampling rate, and this is particularly different between “workout mode” (during the defence, as well as during exercise sessions) and normal tracking, so I can’t just use a summary statistic on the raw data, since highly-sampled periods are also likely to be those with higher heart rate What I aimed to do was to separate waking and sleeping times from the sleep data, show the whole distribution of heart rate (and cut it in the figure at 120 during waking hours because above there is pretty much only exercise) as a ridge plot. I decided to use medians as a summary statistic: while mode would probably be more appropriate physiologically, it’s just not easy to calculate in a robust fashion. Getting a physiologically meaningful summary was not the goal, but rather having a number to make comparison easier, for which the median should do just fine. In order to deal with the different sampling rates, I opted to linearly interpolate heart rate over the day into increments of 5 seconds so that highly-sampled and sparsely-sampled times get as much as importance/coverage as one another. I also decided to only look at a month before in the plot in order not to make it too busy. First, I interpolate the data into 5 second intervals. I do this by calculating the number of seconds, dividing it by 5, and then interpolating into that many points. I’ll first set everything as “Waking”, and then change the sleeping times to “Sleeping” later. hr_ridgedat <- filter(hr, Date >= as.Date("2018-11-01")) n_periods <- abs(as.numeric( difftime( hr_ridgedat$Mid[1],   # First
hr_ridgedat$Mid[nrow(hr_ridgedat)], # Last units = "s")))/5 # no. seconds divided by 5 hr_interp <- tibble(Time = seq(hr_ridgedat$Mid[1],  # First
hr_ridgedat$Mid[nrow(hr_ridgedat)], # Last length.out = n_periods), Waking = "Waking", Date = as.Date(Time)) hr_interp$Freq <- approx(x = hr_ridgedat$Mid, # interpolate y = hr_ridgedat$Freq,
xout = hr_interp$Time, method = "linear")$y

Next, I’ll put together a set of sleep times. Frustratingly, if I went to bed too late (after ~02h00), the watch tended to assume I didn’t sleep at all. I did in fact sleep every night before the defence, and never after 03h00, so I just set my sleeping times for those days in which it didn’t record sleep as being from 03h00. For waking up, however, I can be pretty sure it was around 06h00 each morning, because that’s when my 2 year old daughter wakes up like clockwork!

sleeptimes <- sleepdat$body$series %>%
select(Start=startdate, End=enddate, Date=date) %>%
mutate(Date=as.Date(Date)) %>%
filter(Date >= as.Date("2018-11-01"))

dateseq <- tibble::tibble(Date=seq(sleeptimes$Date[1], Sys.Date(),by = 1) ) sleeptimes <- full_join(dateseq, sleeptimes) ## Joining, by = "Date" measuredsleep <- filter(sleeptimes, !is.na(Start)) missingsleep <- filter(sleeptimes, is.na(Start)) %>% mutate(Start = as.POSIXct(strptime(paste(Date, "3:00"), format = "%Y-%m-%d %H:%M"), tz = Sys.timezone()), End = as.POSIXct(strptime(paste(Date, "6:00"), format = "%Y-%m-%d %H:%M"), tz = Sys.timezone())) sleeptimes <- bind_rows(measuredsleep, missingsleep) Now I can use this data to define the sleeping times. Also, I need to assign sleeping heart rate to the correct days, which does not always correspond to the time recorded: if I was sleeping at 23h00 on day 1, the sleep should be assigned to day 2 along with the rest of my sleep until 06h00 on day 2. This I can also do using the sleeptimes data. I couldn’t readily think of a tidy way of doing this operation, so I used a for loop. Sadface. for(i in 1:nrow(sleeptimes)) { hr_interp$Waking <- ifelse(hr_interp$Time > sleeptimes$Start[i] &
hr_interp$Time < sleeptimes$End[i],
yes = "Sleeping", no = hr_interp$Waking) hr_interp$Date <- ifelse(hr_interp$Time > sleeptimes$Start[i] &
hr_interp$Time < sleeptimes$End[i],
yes = sleeptimes$Date[i], no = hr_interp$Date)

}

saveRDS(hr_interp, file = "../../static/data/20181213_ThesisDefence/hr_interp.rds")

Now, we want to divide the data into waking and sleeping. For some reason, the Date column has now become a double, so we’ll also convert it to date again.

hr_interp <- readRDS("../../static/data/20181213_ThesisDefence/hr_interp.rds")

hr_interp <- hr_interp %>%
mutate(Date = as.Date(Date, origin="1970-01-01", tz = Sys.timezone()))

hr_interp_waking <- filter(hr_interp, Waking=="Waking")

hr_interp_sleep <- filter(hr_interp, Waking=="Sleeping")

Let’s quickly make some median summaries for both.

hr_interp_waking_summary <- hr_interp_waking %>%
group_by(Date) %>%
summarise(MedFreq = median(Freq),
MedFreq_round = formatC(MedFreq,1,format="f"))

hr_interp_sleep_summary <- hr_interp_sleep %>%
group_by(Date) %>%
summarise(MedFreq = median(Freq),
MedFreq_round = formatC(MedFreq,1,format="f"))

### Waking Heart Rate Distributions

So, first, let’s look at my heart rate each day, including the defence day. I’ll make a couple of annotations for the plot. And I’ll also cut everything off at 120 bpm. And we can show the medians on the left.

library(ggridges)
##
## Attaching package: 'ggridges'
## The following object is masked from 'package:ggplot2':
##
##     scale_discrete_manual
annotations_waking <- tribble(
~Date,   ~Label,
"2018-11-30", "Defence day!") %>%
mutate(Date = as.Date(Date))

hr_ridgeplot <- ggplot(hr_interp_waking,
aes(x=Freq, y=Date, group=Date, fill=..x..)) +
geom_density_ridges_gradient(scale = 1.7, rel_min_height = 0.01,
xlim(40, 120) +
scale_fill_viridis_c(name = "Heart\nRate\n(bpm)", option = "A") +
labs(title = 'Waking Heart Rate',
subtitle = "Distributions and medians of interpolated heart rate during waking hours",
caption = "Recordings were made using a Withings Steel HR",
x = "Heart Rate (bpm)") +
theme_ridges(font_size = 13, grid = TRUE) +
theme(axis.title.y = element_blank()) +
geom_point(data = hr_interp_waking_summary, aes(x = MedFreq, y=Date+0.25),
colour="white", size=1.5) +
theme_ipsum_rc() +
geom_text(data=hr_interp_waking_summary,
aes(x = 45, y = Date+0.5, label=MedFreq_round),
size=3, color = "red") +
geom_text(data=annotations_waking, aes(x = 58, y = Date+0.5, label=Label),
size=3, color = "red")

hr_ridgeplot
## Picking joint bandwidth of 1.72

### Conclusions

I must admit, I was quite surprised by how clear the increase for the defence day is compared to the rest! The median is clearly higher than the mode in almost all cases, but as a summary statistic for comparing daily heart rates, it does the job.

### Sleeping Heart Rate Distributions

Next, let’s look at my heart rate during sleep. For each day, the sleep is that which ends on that particular day.

annotations_sleep <- tribble(
~Date,   ~Label,
"2018-11-02", "Thesis sent to printers",
"2018-11-30", "Night before",
"2018-12-01", "It's over!",
"2018-12-02", "Recovery"
) %>%
mutate(Date = as.Date(Date))

hr_ridgeplot_sleep <- ggplot(hr_interp_sleep,
aes(x=Freq, y=Date, group=Date, fill=..x..)) +
geom_density_ridges_gradient(scale = 1.7, rel_min_height = 0.01,
xlim(40, 90) +
scale_fill_viridis_c(name = "Heart\nRate\n(bpm)", option = "A") +
labs(title = 'Sleeping Heart Rate',
subtitle = "Distributions and medians of interpolated heart rate during sleeping hours",
caption = "Recordings were made using a Withings Steel HR") +
theme_ridges(font_size = 13, grid = TRUE) +
theme(axis.title.y = element_blank()) +
geom_point(data = hr_interp_sleep_summary, aes(x = MedFreq, y=Date+0.25),
colour="white", size=1.5) +
theme_ipsum_rc() +
geom_text(data=hr_interp_sleep_summary,
aes(x = 42, y = Date+0.5, label=MedFreq_round),
size=3, color = "red") +
geom_text(data=annotations_sleep, aes(x = 85, y = Date+0.5, label=Label),
size=3, color = "red")

hr_ridgeplot_sleep
## Picking joint bandwidth of 0.816

### Conclusions

I couldn’t believe this when I saw it! Even looking at sleeping heart rate was a bit of an afterthought. The night before can obviously be attributed to nerves, the night of the defence was presumably the result of masses of adrenaline coursing through my body throughout the day having a lasting impact into the night, and I love that the next night is one of the lowest of the nights recorded, in recovery. I was a little surprised about the night on November 2nd, and I had to look it up: it turned out it was the night after I had sent the final version of my thesis to the printers to be printed: I had nightmares all night that I’d forgotten something critical, misspelled a name of a committee member, or that I’d left something out of the main description pages. I must say, I was really surprised by how well the watch was able to pick this all up.

# Summary

I was a little surprised by how well this all worked out. I specifically bought the watch to see the extent to which my body fell apart in the lead-up to the defence, but I didn’t really expect it all to be quite so clear! And if there’s anyone else out there who knows about the biology of all this who has cool ideas for what else I might look at, do hit me up!

Also, big props to Withings for designing a really impressive product! If anyone else wants to start playing with Withings data, do let me know if you run into any issues with rwithings: it’s definitely in need of some work yet.

And finally an enormous thank you to everyone involved in the defence itself, both supervisors and colleagues from throughout the PhD, everyone including friends and family who came on the day itself, as well as the opponent and the members of the examination committee. I had such an amazing and memorable day!