Add logging to your functions using my newest package `{loud}`

[This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is a short blog post to announce the early alpha, hyper unstable, use at your own peril, package I’ve been working on for the past 6 hours or so (actually longer if I add all the research/study time). This package provides the function loudly() which allows you to do cool stuff like:

# First two lines install the package
# install.packages("devtools")
# devtools::install_github("b-rodrigues/loud")
library(loud)
## Loading required package: rlang
loud_sqrt <- loudly(sqrt)

loud_sqrt(1:10)
## $result
##  [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
##  [9] 3.000000 3.162278
## 
## $log
## [1] "Log start..."                                                              
## [2] "sqrt(1:10) started at 2022-02-18 21:17:22 and ended at 2022-02-18 21:17:22"

As you can see, I start by applying loudly() to a function, and then I can use this function as usual. Not only do I get the result, but also a logging message telling me which function and which arguments got used, and when the computation started and ended.

It is also possible to chain operations:

loud_sqrt <- loudly(sqrt)
loud_exp <- loudly(exp)
loud_mean <- loudly(mean)

1:10 |>
  loud_sqrt() |>
  bind_loudly(loud_exp) |>
  bind_loudly(loud_mean)
## $result
## [1] 11.55345
## 
## $log
## [1] "Log start..."                                                                   
## [2] "sqrt(1:10) started at 2022-02-18 21:17:22 and ended at 2022-02-18 21:17:22"     
## [3] "exp(.l$result) started at 2022-02-18 21:17:22 and ended at 2022-02-18 21:17:22" 
## [4] "mean(.l$result) started at 2022-02-18 21:17:22 and ended at 2022-02-18 21:17:22"

You’ll notice that here I have to use another function called bind_loudly(). The reason is because loud functions return a list. The first element of that list is the result of the function applied to the inputs, and the second element is the log message. So bind_loudly() passes the first element of the output of loud_sqrt() to the actual function exp() and also passes the second element, this time the log message, to the part of the function that concatenates the log messages.

This works with any function:

library(dplyr)
loud_group_by <- loudly(group_by)
loud_select <- loudly(select)
loud_summarise <- loudly(summarise)
loud_filter <- loudly(filter)

starwars %>%
  loud_select(height, mass, species, sex) %>%
  bind_loudly(loud_group_by, species, sex) %>%
  bind_loudly(loud_filter, sex != "male") %>%
  bind_loudly(loud_summarise,
              mass = mean(mass, na.rm = TRUE)
              )
## $result
## # A tibble: 9 × 3
## # Groups:   species [9]
##   species    sex              mass
##   <chr>      <chr>           <dbl>
## 1 Clawdite   female           55  
## 2 Droid      none             69.8
## 3 Human      female           56.3
## 4 Hutt       hermaphroditic 1358  
## 5 Kaminoan   female          NaN  
## 6 Mirialan   female           53.1
## 7 Tholothian female           50  
## 8 Togruta    female           57  
## 9 Twi'lek    female           55  
## 
## $log
## [1] "Log start..."                                                                                                 
## [2] "select(.,height,mass,species,sex) started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23"            
## [3] "group_by(.l$result,species,sex) started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23"              
## [4] "filter(.l$result,sex != \"male\") started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23"            
## [5] "summarise(.l$result,mean(mass, na.rm = TRUE)) started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23"

This is not perfect however. You’ll notice that the last log message states:

summarise(.l$result,mean(mass, na.rm = TRUE)) ....

ideally I would like for it to say:

summarise(.l$result,mass = mean(mass, na.rm = TRUE)) ....

Also, I’ve added a pipe operator so you don’t need to use bind_loudly() if you don’t want to:

1:10 |>
  loud_sqrt() %>=%
  loud_exp() %>=%
  loud_mean()
## $result
## [1] 11.55345
## 
## $log
## [1] "Log start..."                                                                   
## [2] "sqrt(1:10) started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23"     
## [3] "exp(.l$result) started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23" 
## [4] "mean(.l$result) started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23"

However, this operator does not work well with {dplyr} functions. See here:

starwars %>%
  loud_select(height, mass, species, sex) %>=%
  loud_group_by(species, sex) %>=%
  loud_filter(sex != "male") %>=%
  loud_summarise(mass = mean(mass, na.rm = TRUE))
## $result
## # A tibble: 23 × 4
## # Groups:   species, sex [9]
##    species  sex     mass na.rm
##    <chr>    <chr>  <dbl> <lgl>
##  1 Clawdite female    55 TRUE 
##  2 Droid    none      75 TRUE 
##  3 Droid    none      32 TRUE 
##  4 Droid    none      32 TRUE 
##  5 Droid    none     140 TRUE 
##  6 Droid    none      NA TRUE 
##  7 Droid    none      NA TRUE 
##  8 Human    female    49 TRUE 
##  9 Human    female    75 TRUE 
## 10 Human    female    NA TRUE 
## # … with 13 more rows
## 
## $log
## [1] "Log start..."                                                                                     
## [2] "select(.,height,mass,species,sex) started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23"
## [3] "group_by(.l$result,species,sex) started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23"  
## [4] "filter(.l$result,sex != \"male\") started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23"
## [5] "summarise(.l$result,mass,TRUE) started at 2022-02-18 21:17:23 and ended at 2022-02-18 21:17:23"

If you look at the result, you’ll see that it is not equal to the obtained with bind_loudly(), and if you look at the last logging message you’ll see why. Instead of

summarise(.l$result,mean(mass, na.rm = TRUE)) ....

the message states:

summarise(.l$result,mass,TRUE) started at

I know where the problem is (it’s due to some regex fuckery) so I think that I should be able to correct this in the coming days. Ideally, in the future, I would also like for the users to provide their own log messages.

The package has a website with a vignette that shows another interesting example here. Source code can be found here.

It is almost certain that function names will change, maybe even the package name itself. Contributions, bug reports, suggestions, etc, welcome of course.

A final word: this is the result of me exploring more advanced functional programming concepts and discussing with really nice people like Andrew R Mcneil, Laszlo Kupcsik. Andrew wrote a cool package called maybe and Laszlo a super cool blog post explaining what monads are here.

I’ll be writing a blog post on monads, in particular the maybe monad soonish.

Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!

Buy me an EspressoBuy me an Espresso

To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)