Site icon R-bloggers

A {swiftr} Brief Interlude While Awaiting {cdcfluview} CRAN Checks

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My {cdcfluview} package started tossing erros on CRAN just over a week ago when the CDC added an extra parameter to one of the hidden API endpoints that the package wraps. After a fairly hectic set of days since said NOTE came, I had time this morning to poke at a fix. There are alot of tests, so after successful debugging session I was awaiting CRAN checks on various remotes as well as README builds and figured I’d keep up some practice with another, nascent, package of mine, {swiftr}, which makes it dead simple to build R functions from Swift code, in similar fashion to what Rcpp::cppFunction() does for C/C++ code.

macOS comes with a full set of machine learning/AI libraries/frameworks that definitely have “batteries included” (i.e. you can almost just make one function call to get 90-95% what you want without even training new models). One of which is text extraction from Apple’s computer Vision framework. I thought it’d be a fun and quick “wait mode” distraction to wrap the VNRecognizeTextRequest() function and use it from R.

To show how capable the default model is, I pulled a semi-complex random image from DDG’s image search:

Let’s build the function (you need to be on macOS for this; exposition inine):

library(swiftr) # github.com/hrbrmstr/swiftr

swift_function(
  code = '
import Foundation
import CoreImage
import Cocoa
import Vision

@_cdecl ("detect_text")
public func detect_text(path: SEXP) -> SEXP {

   // turn R string into Swift String so we can use it
   let fileName = String(cString: R_CHAR(STRING_ELT(path, 0)))

   var res: String = ""
   var out: SEXP = R_NilValue

  // get image into the right format
  if let ciImage = CIImage(contentsOf: URL(fileURLWithPath:fileName)) {

    let context = CIContext(options: nil)
    if let img = context.createCGImage(ciImage, from: ciImage.extent) {

      // setup comptuer vision request
      let requestHandler = VNImageRequestHandler(cgImage: img)

      // start recognition
      let request = VNRecognizeTextRequest()
      do {
        try requestHandler.perform([request])

        // if we have results
        if let observations = request.results as? [VNRecognizedTextObservation] {

          // paste them together
          let recognizedStrings = observations.compactMap { observation in
            observation.topCandidates(1).first?.string
          }
          res = recognizedStrings.joined(separator: "\\n")
        }
      } catch {
        debugPrint("\\(error)")
      }
    }
  }

  res.withCString { cstr in out = Rf_mkString(cstr) }

  return(out)
}
')

The detect_text() is now available in R, so let’s see how it performs on that image of signs:

detect_text(path.expand("~/Data/signs.jpeg")) %>% 
  stringi::stri_split_lines() %>% 
  unlist()
##  [1] "BEWILDERED" "UNCLEAR"    "nAZEU"      "UNCERTAIN"  "VISA"       "INSURE"    
##  [7] "ATED"       "MUDDLED"    "LOsT"       "DISTRACTED" "PERPLEXED"  "CONFUSED"  
## [13] "PUZZLED" 

It works super-fast and gets far more correct than I would have expected.

Toy examples aside, it also works pretty well (as one would expect) on “real” text images, such as this example from the Tesseract test suite:

detect_text(path.expand("~/Data/tesseract/news.3B/0/8200_006.3B.tif")) %>% 
  stringi::stri_split_lines() %>% 
  unlist()
##  [1] "Tobacco chiefs still refuse to see the truth abou"                           
##  [2] "even of America's least conscionable"                                        
##  [3] "The tobacco industry would like to promote"                                  
##  [4] "men sat together in Washington last"                                         
##  [5] "under the conditions they are used.'"                                        
##  [6] "week to do what they do best: blow"                                          
##  [7] "the specter of prohibition."                                                 
##  [8] "panel\" of toxicologists as \"not hazardous"                                 
##  [9] "smoke at the truth about cigarettes."                                        
## [10] "'If cigarettes are too dangerous to be sold,"                                
## [11] "then ban them. Some smokers will obey the"                                   
## [12] "People not paid by the tobacco companies"                                    
## [13] "aren't so sure. The list includes several"                                   
## [14] "The CEOs of the nation's largest tobacco"                                    
## [15] "firms told congressional panel that nicotine"                                
## [16] "law, but many will not. People will be selling"                              
## [17] "iS not addictive, that they are unconvinced"                                 
## [18] "cigarettes out of the trunks of cars, cigarettes"                            
## [19] "substances the government does not allow in"                                 
## [20] "foods or classifies as potentially toxic. They"                              
## [21] "that smoking causes lung cancer or any other"                                
## [22] "made by who knows who, made of who knows include ammonia, a pesticide called"
## [23] "illness, and that smoking is no more harmful"                                
## [24] "what,\" said James Johnston of R.J. Reynolds."                               
## [25] "than drinking coffee or eating Twinkies."                                    
## [26] "It's a ruse. He knows cigarettes are not"                                    
## [27] "methoprene, and ethyl furoate, which has"                                    
## [28] "They said these things with straight taces."                                 
## [29] "going to be banned, at leasi not in his lifetime."                           
## [30] "caused liver damage in rats."                                                
## [31] "The list \"begs a number of important"                                       
## [32] "They said them in the face of massive"                                       
## [33] "STEVE WILSON"                                                                
## [34] "What he really fears are new taxes, stronger"                                
## [35] "questions about the safety of these additives,\""                            
## [36] "scientific evidence that smoking is responsible"                             
## [37] "anti-smoking campaigns, further smoking"                                     
## [38] "said a joint statement from the American"                                    
## [39] "for more than 400,000 deaths every year."                                    
## [40] "restrictions, limits on secondhand smoke and"                                
## [41] "Rep. Henry Waxman, D-Calif., put that"                                       
## [42] "Republic Columnist"                                                          
## [43] "Lung, Cancer and Heart associations. The"                                    
## [44] "limits on tar and nicotine."                                                 
## [45] "statement added that substances safe to eat"                                 
## [46] "frightful statistic another way:"                                            
## [47] "Collectively, these steps can accelerate the"                                
## [48] "\"Imagine our nation's outrage if two fully"                                 
## [49] "He and the others played dumb for the"                                       
## [50] "current 5 percent annual decline in cigarette"                               
## [51] "aren't necessarily safe to inhale."                                          
## [52] "The 50-page list can be obtained free by"                                    
## [53] "loaded jumbo jets crashed each day, killing all"                             
## [54] "entire six hours, but really didn't matter."                                 
## [55] "use and turn the tobacco business from highly"                               
## [56] "calling 1-800-852-8749."                                                     
## [57] "aboard. That's the same number of Americans"                                 
## [58] "The game i nearly over, and the tobacco"                                     
## [59] "profitable to depressed."                                                    
## [60] "Johnson's comment about cigarettes \"made"                                   
## [61] "Here are just the 44 ingredients that start"                                 
## [62] "that cigarettes kill every 24 hours.'"                                       
## [63] "executives know it."                                                         
## [64] "with the letter \"A\":"                                                      
## [65] "The CEOs were not impressed."                                                
## [66] "The hearing marked a turning point in the"                                   
## [67] "of who knows what\" was comical."                                            
## [68] "Acetanisole, acetic acid, acetoin,"                                          
## [69] "\"We have looked at the data."                                               
## [70] "It does"                                                                     
## [71] "nation's growing aversion to cigarettes. No"                                 
## [72] "The day before the hearing, the tobacco"                                     
## [73] "acetophenone,6-acetoxydihydrotheaspirane,"                                   
## [74] "not convince me that smoking causes death,\""                                
## [75] "2-acetyl-3-ethylpyrazine, 2-acetyl-5-"                                       
## [76] "said Andrew Tisch of the Lorillard Tobacco"                                  
## [77] "longer hamstrung by tobacco-state seniority"                                 
## [78] "companies released a long-secret list of 599"                                
## [79] "Co."                                                                         
## [80] "and the deep-pocketed tobacco lobby,"                                        
## [81] "methylfuran, acetylpyrazine, 2-acetylpyridine,"                              
## [82] "Congress is taking aim at cigarette makers."                                 
## [83] "additives used in cigarettes. The companies"                                 
## [84] "said all are certified by an \"independent"                                  
## [85] "3-acetylpyridine, 2-acetylthiazole, aconitic"   

(You can compare that on your own with the Tesseract results.)

FIN

{cdcfluview} checks are done, and the fixed functions are back on CRAN! Just in time to close out this post.

If you’re on macOS, definitely check out the various ML/AI frameworks Apple has to offer via Swift and have some fun playing with integrating them into R (or build some small, command line utilities if you want to keep Swift and R apart).

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.