Taking steps (in XML)

February 1, 2017
By

(This article was first published on R – What You're Doing Is Rather Desperate, and kindly contributed to R-bloggers)

So the votes are in:

I thank you, kind readers. So here’s the plan: (1) keep blogging here as frequently as possible (perhaps monthly), (2) on more general “how to do cool stuff with data and R” topics, (3) which may still include biology from time to time. Sounds OK? Good.

So: let’s use R to analyse data from the iOS Health app.

I own an iPhone. It comes with a Health app installed by default. Not being a big user of mobile apps, it was several months before I opened it and realised that it had been collecting data. Which makes me wonder what else the phone does without my knowledge…but back to the topic. It turns out that health data can be exported by tapping at top-right on the overview page, then choosing export.

Click to view slideshow.

This generates a compressed file, ios_health_export.zip. Upload it from your phone to your destination of choice; I went with Google Drive.

Being Apple, I’d assumed that the contents might be some hideous proprietary binary format but in fact unzipping the file reveals a directory, apple_health_export, in which reside two XML files. The larger export.xml contains your health data.

Records in the XML file consist of lines that specify the record type (measurement), source, three timestamps for creation, start and end, and the value of the measurement. Most of my records are step counts, which look like this:


And so to R. In the past I would have used the XML package but in my ongoing effort to convert to the “tidyverse”, I’ll try xml2 instead. We’ll use purrr too for reasons that will become apparent, ggplot2 for plotting and dplyr because it is awesome.

Reading in the file could not be easier:

library(xml2)
library(purrr)
library(ggplot2)
library(dplyr)

health_data <- read_xml("export.xml")

Nor could extracting the records that contain step counts. We use an xpath expression, then pipe the result to purr’s mapping functions to go straight from XML attributes to a data frame, as described here.

steps <- xml_find_all(health_data, ".//Record[@type='HKQuantityTypeIdentifierStepCount']") %>% map(xml_attrs) %>% map_df(as.list)

glimpse(steps)
Observations: 188,677
Variables: 9
$ type           "HKQuantityTypeIdentifierStepCount", "HKQuantityTypeIdentifierStepCount", "HKQuantityTypeIdentifierStepCount", "HKQuantityTypeIdentifierStepCount", ...
$ sourceName     "Health", "Health", "Health", "Health", "Health", "Health", "Health", "Health", "Health", "Health", "Health", "Health", "Health", "Health", "Health"...
$ unit           "count", "count", "count", "count", "count", "count", "count", "count", "count", "count", "count", "count", "count", "count", "count", "count", "cou...
$ creationDate   "2014-09-24 09:25:06 +1100", "2014-09-24 09:25:06 +1100", "2014-09-24 09:25:06 +1100", "2014-09-24 09:25:06 +1100", "2014-09-24 09:25:06 +1100", "20...
$ startDate      "2014-09-23 17:58:58 +1100", "2014-09-23 17:59:08 +1100", "2014-09-23 17:59:18 +1100", "2014-09-23 17:59:28 +1100", "2014-09-23 17:59:58 +1100", "20...
$ endDate        "2014-09-23 17:59:03 +1100", "2014-09-23 17:59:13 +1100", "2014-09-23 17:59:23 +1100", "2014-09-23 17:59:33 +1100", "2014-09-23 18:00:03 +1100", "20...
$ value          "12", "5", "17", "1", "14", "4", "10", "2", "4", "2", "9", "7", "4", "9", "7", "6", "11", "13", "6", "8", "5", "8", "6", "9", "1", "7", "13", "6", "...
$ sourceVersion  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
$ device         NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...

To illustrate an example analysis, let’s aggregate steps to a monthly count and plot counts by month. We’ll assume that startDate is a proxy for day (i.e. I’m not walking at midnight so steps don’t straddle day boundaries). We’ll also assign the monthly count to the first day of the month, to avoid having to figure out what number day ends the month 🙂

So, to recode step count as an integer, convert the start date to a date object, summarise by month and plot, let’s see dplyr in action:

steps %>% select(startDate, value) %>%
group_by(Date = as.Date(paste(substr(startDate, 1, 7), "01", sep = "-")))
%>% summarise(count = sum(as.numeric(value))) %>%
ggplot(aes(Date, count)) + geom_col(fill = "skyblue3") + theme_bw() + labs(y = "monthly step count", title = "Steps by month September 2014 - January 2017 as measured by iOS")

Result:

ios_steps

As to how accurate the counts are: that’s for another day.

Filed under: personal, R, statistics, this blog Tagged: health, iOS, parsing, xml

To leave a comment for the author, please follow the link and comment on their blog: R – What You're Doing Is Rather Desperate.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)