# ggmail + forecast = how many emails I will get tomorrow?

**SmarterPoland.pl » English**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

During the eRum 2016, Adam Zagdański gave a very good tutorial about time series modeling. Among other things I’ve learned that the forecast package (created by Rob Hyndman) got cool new plots based on the ggplot2 package.

Let’s use it to play with mailbox statistics for my gmail account!

**1. Get the data**

Follow this link to download the data from your gmail account as a single mbox file.

It may be large (15GB in my case), but for further steps it’s enough to keep only headers. `grep + cat`

will do the job.

**2. Read headers**

The `readLines()`

function can handle headers. Then the `lubridate`

package is useful to extract and convert dates to the R format.

**3. Basic gg-exploration**

I’ve started with daily aggregates – number of emails per day.

The `ts()`

function converts vector of aggregates to a time series object.

Then I’ve used the `autoplot()`

function to plot the time series. Since it’s the `ggplot2`

plot, you can easily add a smooth trend to the plot with the `geom_smooth()`

function.

There is some trend, but what about seasonality?

The `geom_boxplot()`

is useful to check if there are differences among days of week or months.

It turns out that the number of emails per day is very different for week-days and weekends.

Also the August is the email-lightest month. Only, on average, 60 per day

**4. Time Series**

The `decompose()`

+ `autoplot()`

functions extract trend and seasonal components from the time series. The multiplicative seasonal component is probably more appropriate here, but below the additive component is presented since it’s easier to read values on the oy axis.

A lot of models that can be fitted with the `forecast`

package. From different choices the most scary one is for the forecast with the Holt method. Scary because of the trend.

**leave a comment**for the author, please follow the link and comment on their blog:

**SmarterPoland.pl » English**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.