R⁶ — Exploring macOS Applications with codesign, Gatekeeper & R

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

(General reminder abt “R⁶” posts in that they are heavy on code-examples, minimal on expository. I try to design them with 2-3 “nuggets” embedded for those who take the time to walk through the code examples on their systems. I’ll always provide further expository if requested in a comment, so don’t hesitate to ask if something is confusing.)

I had to check something on the macOS systems across the abode today and — on a lark — decided to do all the “shell” scripting in R vs bash for a change. After performing the tasks, it occurred to me that not all R users on macOS realize there are hidden gems of information spread across the “boring” parts of the filesystem in SQLite databases. So, I put together a small example that:

  • identifies all the top-level apps in /Applications
  • extracts code signing information from them into a tibble
  • grabs the Gatekeeper whitelist database
  • uses an internal SQLite transformation function inside a dplyr chain
  • merges the info and gives you a basis to explore your apps
  • shows off the wicked cool processx package by @gaborcsardi that you should be using in place of system() / system2()

A quick note about this Gatekeeper database: If an app is not already recognized by macOS and you allow it to run your local system is updated to “whitelist” this app so you don’t get the notification in the future. Apple maintains a large signature list that gets updated throughout the year on your system and your own list is merged with it.

A second quick note (I guess I’m doing more expository than promised 🙂 about QUOTE(): In certain dplyr / dbplyr contexts, what look like local function calls are actually passed over to the SQL side vs used locally. Even though quote() is an internal R function, the use of the lowercase version of it would still go over to the SQL side vs be eval’d locally. However, to avoid confusing others, I try to uppercase conflicts like this when they occur.

library(processx)
library(stringi)
library(docxtractr) # install_github("hrbrmstr/docxtractr")
library(tidyverse)

list.files("/Applications", pattern = "app$", full.names = TRUE) %>% 
  map_df(~{
    message(.x)
    res <- run("codesign", args = c("-dvvv", .x), error_on_status = FALSE)
    if (any(grepl("not signed at all", res$stderr))) {
      list(Executable = .x)
    } else {
      stri_split_lines(res$stderr)[[1]] %>% 
        keep(~grepl("=", .)) %>% 
        stri_split_fixed("=", 2, simplify = TRUE) -> res
      as.list(set_names(res[,2], res[,1]))
    }
  }) %>% 
  mcga() %>% 
  mutate(short_name = stri_replace_last_fixed(basename(executable), ".app", "")) -> my_apps

glimpse(my_apps)
## Observations: 102
## Variables: 20
## $ executable                  <chr> "/Applications/1Password 6.app/Con...
## $ identifier                  <chr> "com.agilebits.onepassword4", "com...
## $ format                      <chr> "app bundle with Mach-O thin (x86_...
## $ codedirectory_v             <chr> "20200 size=29594 flags=0x0(none) ...
## $ hash_type                   <chr> "sha256 size=32", "sha256 size=32"...
## $ candidatecdhash_sha1        <chr> "e21ac2a66473feec6276b448fc518678d...
## $ candidatecdhash_sha256      <chr> "8cf4cb4bdbea3b4d4f9e293e1aee1edb7...
## $ hash_choices                <chr> "sha1,sha256", "sha1,sha256", "sha...
## $ cdhash                      <chr> "8cf4cb4bdbea3b4d4f9e293e1aee1edb7...
## $ signature_size              <chr> "8915", "8936", "4610", "8528", "4...
## $ authority                   <chr> "Apple Root CA", "Apple Root CA", ...
## $ timestamp                   <chr> "Jul 17, 2017, 10:18:00 AM", "Jul ...
## $ info_plist_entries          <chr> "32", "31", "30", "17", "30", "18"...
## $ teamidentifier              <chr> "2BUA8C4S2C", "XZZXE9SED4", "94KV3...
## $ sealed_resources_version    <chr> "2 rules=12 files=2440", "2 rules=...
## $ internal_requirements_count <chr> "1 size=220", "1 size=192", "1 siz...
## $ signed_time                 <chr> NA, NA, "Jul 13, 2017, 6:40:13 PM"...
## $ library_validation_warning  <chr> NA, NA, NA, "OS X SDK version befo...
## $ platform_identifier         <chr> NA, NA, NA, NA, "2", NA, NA, "2", ...
## $ short_name                  <chr> "1Password 6", "Alfred 3", "Amazon...

# this is macOS atekeeper whitelist db
db <- src_sqlite("/private/var/db/gkopaque.bundle/Contents/Resources/gkopaque.db")

db
## src:  sqlite 3.19.3 [/private/var/db/gkopaque.bundle/Contents/Resources/gkopaque.db]
## tbls: conditions, merged, whitelist
whitelist <- tbl(db, "whitelist")

# they are binary blobs, but we need them as text, so use SQLite's "QUOTE()" function
mutate(whitelist, current = QUOTE(current)) %>% 
  select(current) %>% 
  collect() %>% 
  mutate(
    cdhash = stri_replace_first_fixed(current, "X'", "") %>% 
      stri_replace_last_fixed("'", "") %>% 
      stri_trans_tolower()
  ) -> gatekeeper_whitelist_hashes

my_apps <- left_join(my_apps, gatekeeper_whitelist_hashes)

# Unsigned apps
filter(my_apps, is.na(cdhash)) %>% 
  select(short_name)

# you can see your own output here ;-)

# App organization
select(my_apps, identifier) %>% 
  mutate(
    identifier = stri_split_fixed(identifier, ".", simplify = TRUE)[,2],
    identifier = ifelse(identifier == "", "UNSPECIFIED", identifier)
  ) %>% 
  count(identifier, sort=TRUE)
## # A tibble: 47 x 2
##        identifier     n
##             <chr> <int>
##  1          apple    37
##  2    UNSPECIFIED    10
##  3      microsoft     6
##  4  eclecticlight     3
##  5         google     3
##  6         amazon     2
##  7      agilebits     1
##  8 antonycourtney     1
##  9       appscape     1
## 10   audacityteam     1
## # ... with 37 more rows


# Which apps have code-signing validation warnings
filter(my_apps, !is.na(library_validation_warning)) %>% 
  select(short_name, library_validation_warning)
## # A tibble: 7 x 2
##        short_name
##             <chr>
## 1    Amazon Music
## 2        Audacity
## 3         firefox
## 4         RStudio
## 5             VLC
## 6 Webcam Settings
## 7     WordService
## # ... with 1 more variables: library_validation_warning <chr>

NOTE: One could easily extend this example to look for apps across the filesystem.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)