Get out of my way! Dunk thru #rstats errors like the Big Shaq-istician

[This article was first published on rstats – MikeJackTzen, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Ahh, leaves falling, parents crying, collegicians biking uphill with a bag of in-n-out in between their teeth. Must be the new academic school year!

I figured it’s a good time to introduce my work-in-progress datzen package of miscellaneous #rstats functions.  You can bee-line straight to the github readme with more examples.

Or stick around and I’ll highlight the Shaq example showcasing datzen::itersave()

In #rstats if you want to iterate, you can go about it in many different ways. Works pretty well for “homogeneous” iterations.

As good as they are, the standard approaches hit snags for “non-homogeneous” iterations, eg data from the web.

Go ahead, try them. I dare you.

You in 5 hours

“Aw shit, my brute force for loop crapped the bed during iteration 69. Now I have to manually restart it. I hope it doesn’t do it again. I’m running out of patience, and linen.”

Let’s take a look. The Big Aristotle, Dr. Shaq, was a notorious brute on the hardwood. Here he is, contemplating how he should score in the paint:

shaq = function(meatbag){
if(meatbag %in% 'scrub'){return('dunk on em')}
if(meatbag %in% 'sabonis'){return('elbow his face')}
if(!(meatbag %in% c('scrub','sabonis'))){
stop('shaq is confused')}
}

meatbags = c('scrub','sabonis','scrub','kobe')
names(meatbags) = paste0('arg_',seq_along(meatbags))

testthat::expect_failure(lapply(meatbags,FUN=shaq))
#> Error in FUN(X[[i]], ...): shaq is confused

Uh, some error confused Shaq.

enter, stage trap door
“Meet itersave()

front row faints
“It’s… hideously beautiful”

In a nutshell, itersave works like lapply but when it meets an ugly, unskilled, unqualified, and ungraceful error it will keep trucking along like Shaquille The Diesel O’Neal hitchhiking a ride on Chris Dudley’s back

mainDir=paste0(getwd(),'/tests/proto/')
subDir='/temp/'

itersave(func_user=shaq,
         vec_arg_func=meatbags,
         mainDir,subDir)
#> [1] "1 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_1"
#> [1] "2 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_2"
#> [1] "3 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_3"
#> [1] "4 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_4"

The meatbags that Shaq succesfully put into bodybags.

print('the successes')
#> [1] "the successes"
list.files(paste0(mainDir,subDir))
#> [1] "arg_1.rds" "arg_2.rds" "arg_3.rds" "failed"

It’ll also book keep any errors along the way via purrr::safely() and R.utils::withTimeout().

print('the failures')
#> [1] "the failures"
list.files(paste0(mainDir,subDir,'/failed/'))
#> [1] "arg_4.rds"

Along with the out, itersave has an in companion

enter, zipline from balcony
“meet iterload()

audience faints

iterload(paste0(mainDir,subDir,'/failed'))
#> $arg_4
#> $arg_4$ind_fail
#> [1] 4
#> 
#> $arg_4$input_bad
#> [1] "kobe"
#> 
#> $arg_4$result_bad
#> <simpleError in (function (meatbag) {    if (meatbag %in% "scrub") {        return("dunk on em")    }    if (meatbag %in% "sabonis") {        return("elbow his face")    }    if (!(meatbag %in% c("scrub", "sabonis"))) {        stop("shaq is confused")    }})("kobe"): shaq is confused>

Ah, it was the 4th argument, Kobe, that boggled Shaq’s mind.

“Jigga man [was] Diesel, when he [used to] lift the 8 Up” – Jay-Z

*Wiping away my sad Laker tear from my face while I type this*

“What could have been man, what could have been.”

R.I.P Frank Hamblen

Anyways, Shaq wisened up in Miami. He also fattened up in Phoenix, Cleveland, Boston, Hawaii, Catalina, etc.

shaq_wiser = function(meatbag){
if(meatbag %in% 'scrub'){return('dunk on em')}
if(meatbag %in% 'sabonis'){return('elbow his face')}
if(meatbag %in% 'kobe'){return('breakup & makeup')}

if(!(meatbag %in% c('scrub','sabonis','kobe'))){
stop('shaq is confused')}
}

itersave(func_user=shaq_wiser,
         vec_arg_func=meatbags,
         mainDir,subDir)
#> [1] "1 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_1"
#> [1] "2 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_2"
#> [1] "3 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_3"
#> [1] "4 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_4"

So, give me the whole shebang. What was the whole story of Shaqs road trip?

out_il = iterload(paste0(mainDir,subDir))
cbind(meatbags,out_il)
#>       meatbags  out_il            
#> arg_1 "scrub"   "dunk on em"      
#> arg_2 "sabonis" "elbow his face"  
#> arg_3 "scrub"   "dunk on em"      
#> arg_4 "kobe"    "breakup & makeup"

So, if you use bare bones for loops or lapply you’ll crap out immediately when you hit an error.

On the other hand, even using purrr::map with purrr::safely , by design, it’ll do everything in one shot (eg batch results). This is not ideal when working with stuff online. When you backtrack to resolve unforseen edge-cases, it’ll feel like a cantor-set .

For web data in the wild, expect the unexpected. That’s why I baked up itersave . You have non-homogeneous edge cases aplenty.

These Chris Dudley looking edge cases are just waiting in the bushes for you.

Dunk thru them.


To leave a comment for the author, please follow the link and comment on their blog: rstats – MikeJackTzen.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)