Does less sleep today lead to more calories tomorrow?

[This article was first published on Dan Garmat's Blog -- R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Jun-Dec C v. D

Introduction

Last few months I’ve gained some weight so I’m curious if analytics can give insight and show opportunities to get my BMI to normal weight. As a first step, an inquiry into a hypothesis about calories vs. sleep.

Bottom line up front

Days I’m running on less sleep do I eat more to make up for it? Fortunately I have Fitbit data. Bottom line up front: I found no association between hours of sleep the night before and calories recorded the next day (p = 0.241 after filtering missing data, and see how a horizontal line would fit within the error ribbon above). In this process I found a possible lever I could use, though. If I can make 2500 calories my future maximum goal, I can monitor an interesting personal KPI: percent of days calories are above 2500. I hope less than 2 a month.

Analysis

Getting data

Fitbit data currently can be downloaded from one’s user account. This is limited to daily-level data and downloads of a month at a time. I recommend the .xls format for easier processing in R.

fitbit's download site

I renamed each file to each month.

fitbit's downloaded data

So here I’ve got six months of data. Now, my key starting hypothesis is my daily calorie consumption depends on my hours of sleep the night before. In particular, I expect a negative association: the less I slept, the more I ate.

So I’m going to need a handful of packages to efficiently get to this answer and pursue my sub-queries along the way. These packages, to be specific:

<span class="n">require</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">require</span><span class="p">(</span><span class="n">readxl</span><span class="p">)</span><span class="w">
</span><span class="n">require</span><span class="p">(</span><span class="n">lubridate</span><span class="p">)</span><span class="w">
</span><span class="n">require</span><span class="p">(</span><span class="n">ggthemes</span><span class="p">)</span><span class="w">
</span><span class="n">require</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w">
</span><span class="n">require</span><span class="p">(</span><span class="n">purrr</span><span class="p">)</span><span class="w">
</span>

December data

Once these packages are loaded, I can explore December, my most recent month. Note Calories has a comma, so needs to be converted to a number. Similarly, Date needs to be converted to a date. Fixing that and plotting…

Dec C v. D 1

Wow! I’ve really gone on a diet in late December! My self-discipline is awesome!

Oh, wait… I see some zeros. And I know I wasn’t fasting all of those days. In fact, I happen to know I didn’t record calories some days and usually those non-recorded days are above average calorie days, if anything. So I can’t just impute with December’s median or mean. I’d bet 2500 is a fair guess. That might be a charitable gift to myself around Christmas given how many calories I bet I actually consumed…

In fact the day below 1000 calories is also probably fair to set at 2500 too. So let’s fix those and replot.

Dec C v. D 2

A less impressive decrease in calories. Let’s take a look at a boxplot.

Dec C v. D 3

The median’s actually kind of high (gasp!). This range looks believable based on my memory of the month – I mean it’s December. Let’s check out a histogram.

Dec C v. D 4

This actually doesn’t look too normally distributed.

Pulling in all six months

To reduce keystrokes, can use map from purrr library to get them all into one data object.

Jun-Dec C v. D 1

Seem to be a lot of zeros. How many zeros are there?

Jun-Dec C v. D 2

So we have a little problem here. Apparently in August and September I took a break from counting calories. November doesn’t look that complete either. I think July, August and December we can do something with, but time series analysis is looking less and less plausible.

I’ll take those three more complete months and impute everything 1200 calories or fewer to 2500.

Jun-Dec C v. D 3

Sigh, less impressive a decrease in calories. Let’s see the boxplot to get a sense of distribution.

Jun-Dec C v. D 4

Median’s actually kind of high and quite a bit of bunching near 2500, the imputed value on my missing days. Histogram…

Jun-Dec C v. D

We see a ton at the imputed value. It’s really kind of uniform otherwise. Actually let’s take a closer look at all non-imputed values.

Jun-Dec C v. D

I guess this is why I’ve been gaining weight the last six months, as my doctor has noted. This isn’t really normally distributed but it is actually close to symmetric. Both the median and mean are close to 2300.

Jun-Dec C v. D

In fact it so happens this evening I’m at 2284 right now.

So what is happening on these days above the median?

Original theory goes sleep is a big factor, that a low sleep day is a high calorie day. Let’s see.

I know there are some zeros for amount of sleep as well. How many? Using anti-join…

Jun-Dec C v. D

Fifteen days. Actually most days here the watch band was broken, as it broke twice. You can tell those by the fact I “didn’t sleep” multiple days in a row. Those are, 7/22 – 7/29 and 12/15-12/18.
The other days could be ones I was traveling and so slept sitting up and fitbit didn’t record sleep. And 12/30 hasn’t happened yet.

This tells me December data is also suspect, unfortunately. Let’s just join everything and remove every suspect day as defined as less than 1200 calories and 0 hours of sleep.

<span class="n">foods</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">full_join</span><span class="p">(</span><span class="n">sleep_processed</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">1200</span><span class="p">,</span><span class="w"> </span><span class="n">`Hours Sleep`</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w">
  </span><span class="n">Calories_vs_Sleep</span><span class="w">
</span>

Jun-Dec C v. D

Oh dear, a positive slope. Maybe something shows up if considering day of week.

Jun-Dec C v. D

My hypothesis was these would be negative slopes. Are any even significant?
We can do an anova or a linear model and see, but this isn’t looking good.

Jun-Dec C v. D

Yeah it’s non-significant.

Though I wouldn’t expect anything, let’s just see if there is a day that does show up as significant.

Jun-Dec C v. D

Look at that adjusted R-squared of 0.01! Seriously, there’s no evidence for anything going on. Even removing interaction terms presents nothing – no evidence for a different average calorie total per day of the week.

Jun-Dec C v. D

What have I learned? And what can I do?

I’d hoped more sleep meant fewer next day calories.
There’s no good evidence for that in the last six months for me.
This does assume these data are reliable, which could be problematic.
But I’d hope to see at least something going on.
So more sleep won’t save me.

But my take away is the big surprise of the median calories last six months.
2300 is a dang lot, and that’s ignoring the 0’s, which are probably
usually higher, so my true median is even higher.
That’s why I’ve gained the last six months.
I think I need to get the median and average down.

That is where I see a good lever, possibly.
The obvious approach would be to try to set some kind of limit
like either I stop before 2300, or I have to do more exercise if
I pass it, or something like that.

I wonder if above 2300 days are happening more or less frequently recently?

Jun-Dec C v. D

It does look like I’m (recording) going above the median more often recently
so maybe a slight upward trend if anything.
Makes sense with Holidays and Winter.

How many days did I record more than 2500?

Jun-Dec C v. D

This is astonishing.
Nearly 1/3 of the data points I have are days I ate more than 2500.
Wow.
This is a number I can get.
I can work on trying to reduce those days to two or fewer times a month.
I just don’t need that much, ever, especially when wanting to lose weight. Basically, I have a KPI I care about. A metric that drives as Gwendolyn Galsworth would say. That KPI is percent of days over 2500 calories. I want to get that number to 0% or at most 2 out of 30 days is 7%. Pretty cool.

And finally, since we’re looking at sleep, how does average sleep per day look?

Jun-Dec C v. D

Monday’s the tough one.
Saturday’s the good one.
All of these were 9, I wish.
Or at least 8. A future KPI I think.

Analysis next steps

So what makes a lower calorie day different than a higher calorie day?
That is the regression I want to know the answer to.

Feature engineering to follow for a future analysis.

R code used above

<span class="n">require</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">require</span><span class="p">(</span><span class="n">readxl</span><span class="p">)</span><span class="w">
</span><span class="n">require</span><span class="p">(</span><span class="n">lubridate</span><span class="p">)</span><span class="w">
</span><span class="n">require</span><span class="p">(</span><span class="n">ggthemes</span><span class="p">)</span><span class="w">
</span><span class="n">require</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w">
</span><span class="n">require</span><span class="p">(</span><span class="n">purrr</span><span class="p">)</span><span class="w">


</span><span class="n">December_foods</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">read_excel</span><span class="p">(</span><span class="w">
  </span><span class="n">path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"fitbit_export_201712.xls"</span><span class="p">,</span><span class="w">
  </span><span class="n">sheet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Foods"</span><span class="p">)</span><span class="w">

</span><span class="c1"># doesn't see Calories as a number</span><span class="w">
</span><span class="n">glimpse</span><span class="p">(</span><span class="n">December_foods</span><span class="p">)</span><span class="w">
</span><span class="c1"># both are characters</span><span class="w">
</span><span class="n">December_foods</span><span class="o">$</span><span class="n">Date</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ymd</span><span class="p">(</span><span class="n">December_foods</span><span class="o">$</span><span class="n">Date</span><span class="p">)</span><span class="w">
</span><span class="n">December_foods</span><span class="o">$</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o"><-</span><span class="w">  
  </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">gsub</span><span class="p">(</span><span class="s2">","</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">December_foods</span><span class="o">$</span><span class="n">`Calories In`</span><span class="p">))</span><span class="w">

</span><span class="n">ggplot</span><span class="p">(</span><span class="n">December_foods</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_smooth</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories vs. Date"</span><span class="p">)</span><span class="w">


</span><span class="n">assumed_calories_on_blank_days</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">2500</span><span class="w">
</span><span class="n">December_foods</span><span class="o">$</span><span class="n">`Calories In`</span><span class="p">[</span><span class="n">December_foods</span><span class="o">$</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="m">1000</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> 
  </span><span class="n">assumed_calories_on_blank_days</span><span class="w">

</span><span class="n">ggplot</span><span class="p">(</span><span class="n">December_foods</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_smooth</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories vs. Date With Imputed Zeros"</span><span class="p">)</span><span class="w">


</span><span class="n">ggplot</span><span class="p">(</span><span class="n">December_foods</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_boxplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories in December With Imputed Zeros"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">coord_flip</span><span class="p">()</span><span class="w">


</span><span class="n">ggplot</span><span class="p">(</span><span class="n">December_foods</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">binwidth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories in December With Imputed Zeros"</span><span class="p">)</span><span class="w">



</span><span class="c1">### Let's pull in all six months now</span><span class="w">
</span><span class="m">7</span><span class="o">:</span><span class="m">12</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="nf">as.character</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">str_pad</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">pad</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"0"</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">data_months</span><span class="w">
</span><span class="n">files_to_load</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="s2">"fitbit_export_2017"</span><span class="w"> 
                        </span><span class="p">,</span><span class="w"> </span><span class="n">data_months</span><span class="p">,</span><span class="w"> </span><span class="s2">".xls"</span><span class="p">)</span><span class="w">
</span><span class="n">glimpse</span><span class="p">(</span><span class="n">files_to_load</span><span class="p">)</span><span class="w">

</span><span class="n">foods</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">map</span><span class="p">(</span><span class="n">files_to_load</span><span class="p">,</span><span class="w"> </span><span class="n">read_excel</span><span class="p">,</span><span class="w"> </span><span class="n">sheet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Foods"</span><span class="p">)</span><span class="w">

</span><span class="n">glimpse</span><span class="p">(</span><span class="n">foods</span><span class="p">)</span><span class="w">
</span><span class="c1"># we have a list of 6 tables, let's combine them all</span><span class="w">

</span><span class="n">foods</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bind_rows</span><span class="p">(</span><span class="n">foods</span><span class="p">)</span><span class="w">
</span><span class="n">glimpse</span><span class="p">(</span><span class="n">foods</span><span class="p">)</span><span class="w">
 
</span><span class="c1"># OK have to process same as before</span><span class="w">
</span><span class="n">foods</span><span class="o">$</span><span class="n">Date</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ymd</span><span class="p">(</span><span class="n">foods</span><span class="o">$</span><span class="n">Date</span><span class="p">)</span><span class="w">
</span><span class="n">foods</span><span class="o">$</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o"><-</span><span class="w">  
  </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">gsub</span><span class="p">(</span><span class="s2">","</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">foods</span><span class="o">$</span><span class="n">`Calories In`</span><span class="p">))</span><span class="w">

</span><span class="c1"># how many zero days are there this time?</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">foods</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_smooth</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories vs. Date"</span><span class="p">)</span><span class="w">
  
  
</span><span class="n">foods</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">data_month</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">month</span><span class="p">(</span><span class="n">Date</span><span class="p">,</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">data_month</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">summarize</span><span class="p">(</span><span class="n">zero_days</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w">



</span><span class="n">foods</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">data_month</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">month</span><span class="p">(</span><span class="n">Date</span><span class="p">,</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">filter</span><span class="p">(</span><span class="n">data_month</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Jul"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Oct"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Dec"</span><span class="p">))</span><span class="w"> </span><span class="o">-></span><span class="w">
  </span><span class="n">foods_full_months</span><span class="w">

</span><span class="n">assumed_calories_on_blank_days</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">2500</span><span class="w">
</span><span class="n">foods_full_months</span><span class="o">$</span><span class="n">`Calories In`</span><span class="p">[</span><span class="n">foods_full_months</span><span class="o">$</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="m">1000</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> 
  </span><span class="n">assumed_calories_on_blank_days</span><span class="w">

</span><span class="c1"># replot</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">foods_full_months</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Date</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_smooth</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories vs. Date With Imputed Zeros"</span><span class="p">)</span><span class="w">



</span><span class="n">ggplot</span><span class="p">(</span><span class="n">foods_full_months</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_boxplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories With Imputed Zeros"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">coord_flip</span><span class="p">()</span><span class="w">



</span><span class="n">ggplot</span><span class="p">(</span><span class="n">foods_full_months</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">binwidth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories With Imputed Zeros"</span><span class="p">)</span><span class="w">


</span><span class="n">foods</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="m">1200</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">binwidth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories Logged"</span><span class="p">)</span><span class="w">


</span><span class="c1">### So what is happening on these days I eat above the median?</span><span class="w">

</span><span class="n">sleep</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">map</span><span class="p">(</span><span class="n">files_to_load</span><span class="p">,</span><span class="w"> </span><span class="n">read_excel</span><span class="p">,</span><span class="w"> </span><span class="n">sheet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Sleep"</span><span class="p">)</span><span class="w">

</span><span class="n">sleep</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bind_rows</span><span class="p">(</span><span class="n">sleep</span><span class="p">)</span><span class="w">
</span><span class="n">glimpse</span><span class="p">(</span><span class="n">sleep</span><span class="p">)</span><span class="w">

</span><span class="c1"># some processing</span><span class="w">

</span><span class="n">sleep</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">Date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="n">ymd_hm</span><span class="p">(</span><span class="n">`End Time`</span><span class="p">)),</span><span class="w"> 
                 </span><span class="n">Minutes_Asleep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">`Minutes Asleep`</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">Date</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">summarize</span><span class="p">(</span><span class="n">`Hours Sleep`</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">Minutes_Asleep</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">60</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">Weekday</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wday</span><span class="p">(</span><span class="n">Date</span><span class="p">,</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">),</span><span class="w"> 
         </span><span class="n">Weekday2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">Weekday</span><span class="p">,</span><span class="w"> </span><span class="n">ordered</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">))</span><span class="w"> </span><span class="o">-></span><span class="w">
  </span><span class="n">sleep_processed</span><span class="w">
  
</span><span class="n">glimpse</span><span class="p">(</span><span class="n">sleep_processed</span><span class="p">)</span><span class="w">

</span><span class="n">anti_join</span><span class="p">(</span><span class="n">foods</span><span class="p">,</span><span class="w"> </span><span class="n">sleep_processed</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Date"</span><span class="p">,</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">))</span><span class="w">



</span><span class="n">foods</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">full_join</span><span class="p">(</span><span class="n">sleep_processed</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Date"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">1200</span><span class="p">,</span><span class="w"> </span><span class="n">`Hours Sleep`</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w">
  </span><span class="n">Calories_vs_Sleep</span><span class="w">

</span><span class="c1"># drumroll</span><span class="w">
</span><span class="n">Calories_vs_Sleep</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Hours Sleep`</span><span class="p">,</span><span class="w"> 
                                 </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_smooth</span><span class="p">(</span><span class="n">se</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lm"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories Vs. Sleep"</span><span class="p">)</span><span class="w">



</span><span class="n">Calories_vs_Sleep</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Hours Sleep`</span><span class="p">,</span><span class="w"> 
                                 </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">`Calories In`</span><span class="p">,</span><span class="w"> 
                                 </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.factor</span><span class="p">(</span><span class="n">Weekday</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_smooth</span><span class="p">(</span><span class="n">se</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lm"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_fivethirtyeight</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Calories Vs. Sleep by Day of Week"</span><span class="p">)</span><span class="w">
  
</span><span class="n">calories_sleep_lm</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">`Hours Sleep`</span><span class="p">,</span><span class="w"> 
                        </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Calories_vs_Sleep</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">calories_sleep_lm</span><span class="p">)</span><span class="w">

</span><span class="n">calories_sleep_lm2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">`Hours Sleep`</span><span class="o">*</span><span class="n">Weekday</span><span class="p">,</span><span class="w"> 
                        </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Calories_vs_Sleep</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">calories_sleep_lm2</span><span class="p">)</span><span class="w">
</span><span class="c1"># which is Weekday.Q?</span><span class="w">
</span><span class="c1"># it's quadratic, instead I just the day itself</span><span class="w">
</span><span class="n">calories_sleep_lm2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">`Hours Sleep`</span><span class="o">*</span><span class="n">Weekday2</span><span class="p">,</span><span class="w"> 
                         </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Calories_vs_Sleep</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">calories_sleep_lm2</span><span class="p">)</span><span class="w">
</span><span class="c1"># there's no evidence for anything going on</span><span class="w">

</span><span class="c1"># just want to check without the interaction terms</span><span class="w">
</span><span class="n">calories_sleep_lm3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">`Hours Sleep`</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Weekday2</span><span class="p">,</span><span class="w"> 
                         </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Calories_vs_Sleep</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">calories_sleep_lm3</span><span class="p">)</span><span class="w">
</span><span class="c1"># there isn't even evidence for a different calorie level per day on average</span><span class="w">



</span><span class="c1">### What have I learned? ----</span><span class="w">
</span><span class="c1"># There's room for further questions.</span><span class="w">
</span><span class="c1"># For example, another possibility is to look at the day before,</span><span class="w">
</span><span class="c1"># That is, does more food mean less sleep the next day?</span><span class="w">
</span><span class="c1"># Could also consider last two days sleep vs. calories</span><span class="w">
</span><span class="c1"># Going into the details of what I ate after less sleep could also</span><span class="w">
</span><span class="c1"># be interesting - maybe TF/IDF - like idea</span><span class="w">
</span><span class="c1"># may not have enough data</span><span class="w">
</span><span class="c1"># sleep vs. exercise would be interesting</span><span class="w">


</span><span class="n">Calories_vs_Sleep</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">data_month</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">month</span><span class="p">(</span><span class="n">Date</span><span class="p">,</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">data_month</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">summarize</span><span class="p">(</span><span class="n">total_days</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">(),</span><span class="w"> 
            </span><span class="n">calories_above_median</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">2315</span><span class="p">),</span><span class="w">
            </span><span class="n">pct_days_above</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">calories_above_median</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">total_days</span><span class="p">)</span><span class="w">


</span><span class="c1"># another good question would be to consider data from last summer when</span><span class="w">
</span><span class="c1"># I was losing weight</span><span class="w">
</span><span class="c1"># how does it look then?</span><span class="w">
</span><span class="c1"># I also wasn't working full time, so may be different then</span><span class="w">
</span><span class="c1"># with sleep connection</span><span class="w">

</span><span class="n">Calories_vs_Sleep</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">data_month</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">month</span><span class="p">(</span><span class="n">Date</span><span class="p">,</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">data_month</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">summarize</span><span class="p">(</span><span class="n">total_days</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">(),</span><span class="w"> 
            </span><span class="n">calories_above_2500</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">`Calories In`</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">2500</span><span class="p">),</span><span class="w">
            </span><span class="n">pct_days_above</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">calories_above_2500</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">total_days</span><span class="p">)</span><span class="w">


</span><span class="c1"># also cool would be a bagplot of this, but maybe</span><span class="w">
</span><span class="c1"># if there was a relationship between the two</span><span class="w">

</span><span class="c1"># some other next steps:</span><span class="w">
</span><span class="c1"># build up other variables per day like:</span><span class="w">
</span><span class="c1">#  step count</span><span class="w">
</span><span class="c1">#  minutes very/fairly active</span><span class="w">
</span><span class="c1">#  activity calories</span><span class="w">
</span><span class="c1">#  daily totals of fat, fiber, carbs, sodium, protein</span><span class="w">
</span><span class="c1">#  day before information</span><span class="w">
</span><span class="c1"># then run a multiple regression</span><span class="w">
</span><span class="c1"># maybe some regularization</span><span class="w">
</span><span class="c1"># maybe include more data</span><span class="w">
</span><span class="c1"># maybe a month variable (if not too many already)</span><span class="w">
</span><span class="c1"># maybe a weekend/weekday variable</span><span class="w">
</span><span class="c1"># maybe a breakfast variable</span><span class="w">
</span><span class="c1"># dinner vs. sleep?</span><span class="w">
</span><span class="c1"># exercise vs. sleep?</span><span class="w">
</span><span class="c1"># previous day's calories vs. sleep? # this could be the real story</span><span class="w">


</span><span class="c1"># average sleep per day</span><span class="w">
</span><span class="n">sleep_processed</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">group_by</span><span class="p">(</span><span class="n">Weekday</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">summarize</span><span class="p">(</span><span class="n">mean</span><span class="p">(</span><span class="n">`Hours Sleep`</span><span class="p">),</span><span class="w"> </span><span class="n">median</span><span class="p">(</span><span class="n">`Hours Sleep`</span><span class="p">))</span><span class="w">
</span>

To leave a comment for the author, please follow the link and comment on their blog: Dan Garmat's Blog -- R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)