Quick Hit: Processing macOS Application Metadata Weirdly Fast with mdls and R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
(reminder: Quick Hits have minimal explanatory blathering, but I can elaborate on anything if folks submit a comment).
I’m playing around with Screen Time on xOS again and noticed mdls
(macOS command line utility for getting file metadata) has a -plist
option (it probably has for a while & I just never noticed it). I further noticed there’s a kMDItemExecutableArchitectures
key (which, too, may have been “a thing” before as well). Having application metadata handy for the utility functions I’m putting together for Rmd-based Screen Time reports would be handy, so I threw together some quick code to show how to work with it in R.
Running mdls -plist /some/file.plist ...path-to-apps...
will generate a giant property list file with all metadata for all the apps specified. It’s a wicked fast command even when grabbing and outputting metadata for all apps on a system.
Each entry looks like this:
<dict> <key>_kMDItemDisplayNameWithExtensions</key> <string>RStudio — tycho.app</string> <key>kMDItemAlternateNames</key> <array> <string>RStudio — tycho.app</string> </array> <key>kMDItemCFBundleIdentifier</key> <string>com.RStudio_—_tycho</string> <key>kMDItemContentCreationDate</key> <date>2021-01-31T17:56:46Z</date> <key>kMDItemContentCreationDate_Ranking</key> <date>2021-01-31T00:00:00Z</date> <key>kMDItemContentModificationDate</key> <date>2021-01-31T17:56:46Z</date> <key>kMDItemContentModificationDate_Ranking</key> <date>2021-01-31T00:00:00Z</date> <key>kMDItemContentType</key> <string>com.apple.application-bundle</string> <key>kMDItemContentTypeTree</key> <array> <string>com.apple.application-bundle</string> <string>com.apple.application</string> <string>public.executable</string> <string>com.apple.localizable-name-bundle</string> <string>com.apple.bundle</string> <string>public.directory</string> <string>public.item</string> <string>com.apple.package</string> </array> <key>kMDItemCopyright</key> <string>Copyright © 2017-2020 BZG Inc. All rights reserved.</string> <key>kMDItemDateAdded</key> <date>2021-04-09T18:29:52Z</date> <key>kMDItemDateAdded_Ranking</key> <date>2021-04-09T00:00:00Z</date> <key>kMDItemDisplayName</key> <string>RStudio — tycho.app</string> <key>kMDItemDocumentIdentifier</key> <integer>0</integer> <key>kMDItemExecutableArchitectures</key> <array> <string>x86_64</string> </array> <key>kMDItemFSContentChangeDate</key> <date>2021-01-31T17:56:46Z</date> <key>kMDItemFSCreationDate</key> <date>2021-01-31T17:56:46Z</date> <key>kMDItemFSCreatorCode</key> <integer>0</integer> <key>kMDItemFSFinderFlags</key> <integer>0</integer> <key>kMDItemFSInvisible</key> <false/> <key>kMDItemFSIsExtensionHidden</key> <true/> <key>kMDItemFSLabel</key> <integer>0</integer> <key>kMDItemFSName</key> <string>RStudio — tycho.app</string> <key>kMDItemFSNodeCount</key> <integer>1</integer> <key>kMDItemFSOwnerGroupID</key> <integer>20</integer> <key>kMDItemFSOwnerUserID</key> <integer>501</integer> <key>kMDItemFSSize</key> <integer>37451395</integer> <key>kMDItemFSTypeCode</key> <integer>0</integer> <key>kMDItemInterestingDate_Ranking</key> <date>2021-04-13T00:00:00Z</date> <key>kMDItemKind</key> <string>Application</string> <key>kMDItemLastUsedDate</key> <date>2021-04-13T12:47:12Z</date> <key>kMDItemLastUsedDate_Ranking</key> <date>2021-04-13T00:00:00Z</date> <key>kMDItemLogicalSize</key> <integer>37451395</integer> <key>kMDItemPhysicalSize</key> <integer>38092800</integer> <key>kMDItemUseCount</key> <integer>20</integer> <key>kMDItemUsedDates</key> <array> <date>2021-03-15T04:00:00Z</date> <date>2021-03-17T04:00:00Z</date> <date>2021-03-18T04:00:00Z</date> <date>2021-03-19T04:00:00Z</date> <date>2021-03-22T04:00:00Z</date> <date>2021-03-25T04:00:00Z</date> <date>2021-03-30T04:00:00Z</date> <date>2021-04-01T04:00:00Z</date> <date>2021-04-03T04:00:00Z</date> <date>2021-04-05T04:00:00Z</date> <date>2021-04-07T04:00:00Z</date> <date>2021-04-08T04:00:00Z</date> <date>2021-04-12T04:00:00Z</date> <date>2021-04-13T04:00:00Z</date> </array> <key>kMDItemVersion</key> <string>4.0.1</string> </dict>
We can get all the metadata for all installed apps in R via:
library(sys) library(xml2) library(tidyverse) # get full paths to all the apps list.files( c("/Applications", "/System/Library/CoreServices", "/Applications/Utilities", "/System/Applications"), pattern = "\\.app$", full.names = TRUE ) -> apps # generate a giant property list with all the app attributres tf <- tempfile(fileext = ".plist") sys::exec_internal("mdls", c("-plist", tf, apps))
Unfortunately, some companies — COUGH Logitech COUGH — stick illegal entities in some values, so we have to take care of those (I used xmllint
to see which one(s) were bad):
# read it in and clean up CDATA error (Logitech has a bad value in one field) fil <- readr::read_file_raw(tf) fil[fil == as.raw(0x03)] <- charToRaw(" ")
Now, we can read in the XML without errors:
# now parse it and get the top of each app entry applist <- xml2::read_xml(fil) (applist <- xml_find_all(applist, "//array/dict")) ## {xml_nodeset (196)} ## [1] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>1Blocker (Old).app</string>\n <key>kMDItemAlternateNames</key>\n ... ## [2] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>1Password 7.app</string>\n <key>_kMDItemEngagementData</key>\n ... ## [3] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Adblock Plus.app</string>\n <key>kMDItemAlternateNames</key>\n ... ## [4] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>AdBlock.app</string>\n <key>kMDItemAlternateNames</key>\n <arra ... ## [5] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>AdGuard for Safari.app</string>\n <key>kMDItemAlternateNames</ke ... ## [6] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Agenda.app</string>\n <key>kMDItemAlternateNames</key>\n <array ... ## [7] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Alfred 4.app</string>\n <key>kMDItemAlternateNames</key>\n <arr ... ## [8] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Android File Transfer.app</string>\n <key>kMDItemAlternateNames< ... ## [9] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Asset Catalog Creator Pro.app</string>\n <key>kMDItemAlternateNa ... ## [10] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Awsaml.app</string>\n <key>kMDItemAlternateNames</key>\n <array ... ## [11] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Boop.app</string>\n <key>kMDItemAlternateNames</key>\n <array>\ ... ## [12] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Buffer.app</string>\n <key>kMDItemAlternateNames</key>\n <array ... ## [13] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Burp Suite Community Edition.app</string>\n <key>kMDItemAlternat ... ## [14] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Camera Settings.app</string>\n <key>kMDItemAlternateNames</key>\ ... ## [15] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Cisco Webex Meetings.app</string>\n <key>kMDItemAlternateNames</ ... ## [16] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Claquette.app</string>\n <key>kMDItemAlternateNames</key>\n <ar ... ## [17] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Discord.app</string>\n <key>kMDItemAlternateNames</key>\n <arra ... ## [18] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Elgato Control Center.app</string>\n <key>kMDItemAlternateNames< ... ## [19] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>F5 Weather.app</string>\n <key>kMDItemAlternateNames</key>\n <a ... ## [20] <dict>\n <key>_kMDItemDisplayNameWithExtensions</key>\n <string>Fantastical.app</string>\n <key>kMDItemAlternateNames</key>\n < ... ## ...
I really dislike property lists as I’m not a fan of position-dependent records in XML files. To get values for keys, we have to find the key, then go to the next sibling, figure out its type, and handle it accordingly. This is a verbose enough process to warrant creating a small helper function:
# helper function to retrieve the values for a given key kval <- function(doc, key) { val <- xml_find_first(doc, sprintf(".//key[contains(., '%s')]/following-sibling::*", key)) switch( unique(na.omit(xml_name(val))), "array" = as_list(val) |> map(unlist, use.names = FALSE) |> map(unique), "integer" = xml_integer(val), "true" = TRUE, "false" = FALSE, "string" = xml_text(val, trim = TRUE) ) }
This is nowhere near as robust as XML::readKeyValueDB()
but it doesn’t have to be for this particular use case.
We can build up a data frame with certain fields (I wanted to know how many apps still aren’t Universal):
tibble( category = kval(applist, "kMDItemAppStoreCategory"), bundle_id = kval(applist, "kMDItemCFBundleIdentifier"), display_name = kval(applist, "kMDItemDisplayName"), arch = kval(applist, "kMDItemExecutableArchitectures"), ) |> print() -> app_info ## # A tibble: 196 x 4 ## category bundle_id display_name arch ## <chr> <chr> <chr> <list> ## 1 Productivity com.khanov.BlockerMac 1Blocker (Old).app <chr [2]> ## 2 Productivity com.agilebits.onepassword7 1Password 7.app <chr [2]> ## 3 Productivity org.adblockplus.adblockplussafarimac Adblock Plus.app <chr [2]> ## 4 Productivity com.betafish.adblock-mac AdBlock.app <chr [1]> ## 5 Utilities com.adguard.safari.AdGuard AdGuard for Safari.app <chr [1]> ## 6 Productivity com.momenta.agenda.macos Agenda.app <chr [2]> ## 7 Productivity com.runningwithcrayons.Alfred Alfred 4.app <chr [2]> ## 8 NA com.google.android.mtpviewer Android File Transfer.app <chr [1]> ## 9 Developer Tools com.bridgetech.asset-catalog Asset Catalog Creator Pro.app <chr [2]> ## 10 Developer Tools com.rapid7.awsaml Awsaml.app <chr [1]> ## # … with 186 more rows
Finally, we can expand the arch
column and see how many apps support Apple Silicon:
app_info |> unnest(arch) |> spread(arch, arch) |> mutate_at( vars(arm64, x86_64), ~!is.na(.x) ) |> count(arm64) ## # A tibble: 2 x 2 ## arm64 n ## <lgl> <int> ## 1 FALSE 33 ## 2 TRUE 163
Alas, there are still some stragglers stuck in Rosetta 2.
FIN
Drop comments if anything requires more blathering and have some fun with your macOS filesystem!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.