In this post, we will see how to download personal Fitbit data histories for
step counts, heart rate, and sleep via the Fitbit API. We will use a
combination of existing R packages and custom calls to the Fitbit API to get
all of the data we are interested in.
This post won’t focus on data analysis per se, but rather data collection. As
I was going about the exercise of retrieving my own Fitbit data, I noticed
that there were no good examples of collecting one’s entire data history (but
a number of descriptions of getting a single day’s worth of data). Because data gathering is such a fundamental part
of the data science exercise, I think it’s worth it to go into detail about
how to access one’s personal Fitbit data history via API!
Step 1: Getting Set Up
Create a Developer Account
The first thing we need to do is create a developer account with Fitbit, in
order to get a key and secret to access the API. I won’t go into the details
here, but you can check out these
two very clear and detailed
descriptions of how to go through this process – the guides make it very easy.
Make sure to select “Personal” for the OAuth 2.0 Application Type , or you
won’t be able to access intra-day time series information (e.g. step or heart
rate data at the minute-level).
Set Up the R Environment
Next, we will make sure we have everything loaded in our R environment. Below,
I specify the Fitbit key and secret I obtained in the above procedure. I also
specify the directory to which I will save the data I’ve downloaded, and
create a vector containing the dates for which I’ve had my Fitbit. We will use
this vector of dates in our calls to the API.
The workhorse package for most of this exercise is the excellent
fitbitr package. This package
comes with commands to easily obtain a token to access the Fitbit API, and to
easily compose queries to request data. When you get the token, a separate
window from your internet browser will open and you will have to specify which
data you want to access via the API (see the guides above for more details).
I’ve created a vector of dates (called fitbit_dates ) that correspond to the
period of time that I have had the Fitbit (ending at the time this blog post
was prepared - December 2018). The vector contains 274 dates. We will use this
as input for our code below, extracting step count and heart rate data for
every date in the fitbit_dates vector.
Note that you will need to replace the key and secret with the ones you obtain
from the Fitbit developer platform!
Step 2: Obtaining Step Count Data
Testing with a Single Day
We will use the “ get_activity_intraday_time_series “ function included in
the fitbitr package to download the intraday time series (at the minute-
level) for the step data. The function in the fitbitr package is simply a
wrapper that takes an activity (e.g. steps, calories, etc.), a date, a level
of granularity for the requested information (1 or 15 minutes) and executes an
API call, returning a cleaned data frame with the requested data. For more
information about the data available through the intraday time series API, you
can check out the
fitbitr and Fitbit API
Let’s test a single day with the get_activity_intraday_time_series function
in the fitbitr package. We will adapt the test code described in the
The code looks like this:
And the resulting dataframe looks like this:
The data set contains information on the date, the total number of steps
walked on that day (15,562 for 2018-03-20), and the number of steps (called
value in the data set) for the given minute. We also have meta data
describing the granularity of the measurement - we are recording steps in
1-minute intervals. Finally, the last line in the code (taken directly from
the fitbitr documentation) creates an R date object called time with the
day, hour, minute and second level information all combined in one variable.
There are 1,440 lines in our dataset - 1 line for each minute of the day.
Getting Data for All Days
We now have working code that obtains data for a single day. We simply need a
way to programmatically execute this procedure for the 274 dates in our “
fitbit_dates “ vector.
Below, I accomplish this via a function which will be applied to our vector of
dates. For each date in the vector, we execute the
get_activity_intraday_time_series command, obtaining the data in the format
Note that I include a Sys.sleep command in the function.
This causes the program to pause for 30 seconds before continuing. I included
this because the Fitbit API has a rate limit of 150 calls per
With the above function, we should make around 2 calls per minute, or 120
calls per hour. We are guaranteed not to go above the limit!
I also include some print statements in the function. As the function
executes, we get an update on where the function is in the list of dates to
download, and the time at which the last download occurred. If we encounter an
error, this information will help us debug the problem.
The function returns a list (because we use the
command of data frames, one data frame for each date in our fitbit_dates
vector. I then make a single data frame from the list of data frames, and save
the merged step data (called intraday_steps_df ) to the directory specified
The output returned to the console during the execution of the function looks
Some Basic Checks
Below, I do some basic checks on the data we have obtained via the API. Our
intraday_steps_df has 274 unique dates, with 1,440 observations for each day
(this matches the number of observations from our test case above). The final
check confirms that our master data frame contains data for each day in our
Step 3: Obtaining Heart Rate Data
Testing with a Single Day
In order to obtain minute-level heart rate data, we will use the built in
function from the
fitbitr package called “ get_heart_rate_intraday_time_series “.
Unfortunately, this function returns a data set that is formatted much less
nicely than the comparable function for steps used above. Specifically, the
basic call using this command for our test date looks like this:
And it returns a data set that is formatted like this:
We are missing a lot of the great meta-data we have in the steps data. We
don’t even have the date on which the measurements were taken!
Getting Data for All Days
I wrote a simple function to cycle through each date in our fitbit_dates
vector, and download and add important meta-data. There was one date
(“2018-06-05”) that was problematic. On that date, there was no heart rate
data, and so the call returned just a string with the date.
The function below contains an exception to handle this error, and returns a
list of data frames (or date strings in the case of errors):
The head of our master data frame (called intraday_heart_df ) looks like
It matches much more closely the format of the step count data we downloaded
Some Basic Checks
As we did above, let’s do some basic checks on the data we downloaded.
I first look and see how many of the dates in the fitbit_dates vector were
problematic. Only one date contained no data. For the problematic date
identified above (“2018-06-05”), there is no data in the master data frame -
this is as it should be. The one date from the fitbit_dates vector that is
not in our master data is “2018-06-05”, which is exactly what we would expect.
Unlike the step count data above, we do not have observations for each minute
of each day. We seem only to have data for the minutes for which there was a
measurement recorded by the Fitbit. When examining the number of observations
per day, the lowest is just under 500, and the highest is 1437. It seems clear
that, even when I wear the Fitbit all day, there are some moments of the day
where my heart rate is not recorded (perhaps because the placement of the
Fitbit on my wrist was not optimal during those points).
This does not seem particularly problematic. In further data analysis, we
should simply keep this difference between the data structures in mind.
Step 4: Obtaining Sleep Data
The final piece of information which we will obtain in this post is the sleep
data. These data were the least straightforward to obtain, for a number of
In the time since the fitbitr package was released, the Fitbit API has
been updated (to version 1.2 as of end December 2018). The fitbitr package
calls Version 1.0 of the API, which could be discontinued at any moment.
This change in the API coincides with a change in the sleep measurements
calculated by Fitbit. In Version 1.0 of the API, the data returned are in the
“classic” format, which contains three values: minutes restless , minutes
asleep , and minutes awake. In version 1.2 of the API, the data are mostly
returned in the “stages” format, which contains 4 values: wake , light ,
deep , and rem.
However, the “stages” data are not calculated for periods of sleep that
are shorter than 3 hours. Therefore, Version 1.2 of the API returns “stages”
data for periods of sleep that are longer than 3 hours, and “classic” data for
periods of sleep that are shorter than 3 hours.
It is not possible to manually harmonize the data between the “classic”
and “stages” formats.
In sum, it’s a bit of a mess (as of the writing of this blog post - end
December 2018). There’s lots of discussion on the Fitbit developer forums, but
I didn’t see any great solutions and I also noticed that other people are
struggling with this same issue.
Therefore, we will simply request summary totals per night of the few
variables that are measured consistently between the “classic” and “stages”
data formats. We will build the API calls ourselves, without the interface of
the fitbitr package, and will use version 1.2 of the API (the most recent
version at the time of this writing).
We can call up to 100 days with each API request. I constructed the API calls
using the Fitbit sleep API
created two chunks of codes to download the data. I didn’t start wearing my
Fitbit at night until “2018-06-19” and so we will use that as the start date
for our requests.
The following code downloads the sleep data in two chunks (so we don’t exceed
the API limits), makes a selection of the data which is consistent across
“classic” and “stages” data formats, binds the data frames together, and saves
the master file.
The head of our master sleep data frame looks like this:
We have some basic information about each night’s sleep - the date, number of
minutes asleep, the number of minutes awake, and the time spent in bed. The
information is not very granular in comparison with the step count and heart
rate data, but given the circumstances, this is the most consistent
information we can extract across all sleep episodes as of the writing of this
blog post. (Any suggestions or improvements are welcome - please let me know
in the comments section below!)
Some Basic Checks
Finally, let’s do some basic checks on the sleep data that we’ve downloaded.
There are 193 lines in the data set, corresponding to 193 sleep episodes
across the time I’ve been wearing the Fitbit at night.
Below, I check the number of sleep observations across the days. There were
138 days with 1 recorded sleep episode, 23 with 2 sleep episodes (naps,
clearly), and 3 days where I apparently slept 3 different times!
I also checked to see how many of the dates were missing data. There were 183
days in the time window we used to retrieve the data, but 19 of these dates
were missing from the data we got back from the API. This doesn’t seem crazy -
when the battery on my Fitbit is low, I usually charge it overnight, meaning
that it cannot record any information about my sleeping patterns.
All in all, it looks like the retrieval of the summary sleep data was
Summary and Conclusion
In this blog post, we saw how to download data from a Fitbit step counter. The
first step was to register a developer account at the Fitbit website. With the
key and secret from this process, it is possible to request one’s own
individual data from Fitbit via their API.
We used the fitbitr package to extract the step count and heart rate data
at the most granular level possible. We used a custom function to download the
data for each day, pausing after downloading each day’s data to avoid
exceeding the API limits. We created one master data set for the step count
data, and one for the heart rate data. The sleep data were more complicated to
gather correctly, given the current formats of the data and their availability
from the API. Nevertheless, we were able to extract summary statistics for
each night by making calls directly to the API.
We’ll save the analysis of these data for a future post. However, I’m glad to
have taken the time to talk in detail about the data retrieval process, as
this is a critical but under-appreciated aspect of data science. It is my hope
that this post will be helpful to anyone who is looking to extract and analyze
the complete history of the data from their Fitbits.
Coming Up Next
In the next post, we will focus on data munging in Python. Specifically, we
will return to the data on Pitchfork music reviews that I have analyzed in previous posts on this blog. We will go through the process of extracting, cleaning, and merging of the raw data (contained in separate tables in an SQL database) to produce a clean, tidy data set for analysis.