[This article was first published on R on Sastibe's Data Science Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have a personal Google account, complete with gmail, gdrive and everything else. I first opened it up as a sort of spam email for all kinds of logins, but started to it use more and more due to its convenience. I was always slightly worried about the magnitude of data collected by Google on me, yet I never found a way to pinpoint exactly the extent of my slight worrying.

Recently, I discovered Google Takeout. Everybody with a Google Account can simply click here, follow the instructions and Google Takeout will send all the data it (supposedly) has in a nice little zip folder. Within this zip-folder is a file called “locationhistory.json” (or “standortverlauf.json” for all you German users out there), with entries such as this:

{
"timestampMs" : "1523378268382",
"latitudeE7" : 494290669,
"longitudeE7" : 86872541,
"accuracy" : 34
}

Each of these entries encodes a location measurement taken by Google, with GPS coordinates (latitude/longitude) and a timestamp, which can be converted to a “normal date” by dividing the number by 1000 and using, e.g., this handy site.

The “location history” file is rather large and unwieldy (about 18 MB in my case). There is a very simple and free tool that visualizes your location history data in an interactive heatmap. That is the tool I used to create the intro picture to this entry. The heatmap allows you to gauge the precision with which Google matches your movements. For instance, my skiing trip in March last year to Serfaus-Fiss-Ladis shows up like this:

There are some mistakes in this map, i.e., places that I have surely never visited. I was never in “Gasthaus zum weißen Lamm”, I know that for a fact. However, the detail is quite astonishing, leading me to the next question: How often does Google measure and store my location data? My “locationhistory.json” contains 59293 observation over the course of 465 days, so that, on average, there are more than 5 measurements per hour.

I decided to look a little closer at the distributions of the timestamps, using some wonderful ggplot magic (the R code can be found here):

The lines in the plot show the average number of location measurements taken by Google each hour, separated by weekdays. The dashed line indicates the aveerage over all weekdays. The plot highlights several information:

• Between noon and 8 pm, Google takes on average more than one location measurement every 10 minutes
• In the nighttime, the average number of measurements is only once every 20 minutes
• Monday and Tuesday mornings are closely watched with many measurements, especially Tuesday mornings
• Afternoons and evenings are always of interest, but especially on Fridday and Saturday.

The fact that Monday and Tuesday morning are such exceptions could be explained by my specific calendar in 2017: I worked as a consultant and usually left home on Tuesday morning to travel troughout Germany. I am not entirely sure why this should lead to more measurements, as this activity was rarely accompanied by Google services (I travel by Deutsche Bahn). However, my travel time back home, usually late on Thursday evening, can also be seen quite nicely in the plot.

In total, I am a bit shocked by the sheer magnitude of measurements Google has on me, even (and especially) at times at which I am positively certain that I have never used Google Location Services, (see, e.g., 4 am). I am glad that services like Google Maps exist and that they are so extremely convenient, but the drawback should also be made abundantly clear to anyone who uses these services: many machine-readable aspects of your life are available to a for-profit company.