Homebrew 2.0.0 Released == homebrewanalytics package updated

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A major new release of Homebrew has landed and now includes support for Linux as well as Windows! via the Windows Subsystem for Linux. There are overall stability and speed improvements baked in as well. The aforelinked notification has all the info you need to see the minutiae. Unless you’ve been super-lax in updating, brew update will get you the latest release.

There are extra formulae analytics endpoints and the homebrewanalytics???? R package has been updated to handle them. A change worth noting in the package is that all the API calls are memoised to avoid hammering the Homebrew servers (though the “API” is really just file endpoints and they aren’t big files but bandwidth is bandwidth). Use the facilities in the memoise package to invalidate the cache if you have long running scripts.

Use your favorite social coding site to install it (If I don’t maintain mirrors on your open social coding platform of choice just drop a note in the comments and I’ll start mirroring there as well):

# or
# or

The README and in-package manual pages provide basic examples of retrieving data. But we can improve upon those here, such as finding out the dependency distribution of Homebrew formulae:

library(homebrewanalytics) # git.sr.hr/~hrbrmstr ; git[la|hu]b/hrbrmstr

f <- brew_formulae()

mutate(f, n_dependencies = lengths(build_dependencies)) %>% 
  count(n_dependencies) %>% 
  mutate(n_dependencies = factor(n_dependencies)) %>% 
  ggplot() +
  geom_col(aes(n_dependencies, n), fill = ft_cols$slate, width = 0.65) +
  scale_y_comma("# formulae") +
    x = "# Dependencies",
    title = "Dependency distribution for Homebrew formulae"
  ) +

Given how long it sometimes takes to upgrade my various Homebrew installations I was surprised to see 0 be so prevalent, but one of the major changes in 2.0.0 is going to be more binary installs (unless you really need custom builds) so that is likely part of my experience, especially with the formulae I need to support cybersecurity and spatial operations.

We can also see which formuale are in the top 50%:

unlist(f$dependencies) %>% 
  table(dnn = "library") %>% 
  broom::tidy() %>% 
  arrange(desc(n)) %>% 
  mutate(pct = n/sum(n), cpct = cumsum(pct)) %>% 
  filter(cpct <= 0.5) %>% 
  mutate(pct = scales::percent(pct)) %>% 
  mutate(library = factor(library, levels = rev(library))) %>% 
  ggplot(aes(n, library)) +
  geom_segment(aes(xend=0, yend=library), color = ft_cols$slate, size=3.5) +
    aes(x = (n+max(n)*0.005), label = sprintf("%s (%s)", n, pct)), 
    hjust = 0, size = 3, family = font_rc, color = ft_cols$gray
  ) +
  scale_x_comma(position = "top", limits=c(0, 500)) +
    x = "# package using the library", y = NULL,
    title = "Top 50% of libraries used across Homebrew formulae"
  ) +
  theme_ft_rc(grid="X") +
  theme(axis.text.y = element_text(family = "mono"))

It seems openssl is pretty popular (not surprising but always good to see cybersecurity things at the top of good lists for a change)! macOS ships with an even more dreadful (I know that’s hard to imagine) default Python setup than usual so it being number 2 is not unexpected.

And, finally, we can also check on how frequently formulae are installed. Let’s look back on the last 90 days:

ggplot() +
    aes(x = installs$count, y = stat(count)),
    color = ft_cols$slate, fill = alpha(ft_cols$slate, 1/2)
  ) +
  scale_x_comma("# install events", trans = "log10") +
  scale_y_comma("# formulae") +
    title = "Homebrew Formulate 'Install Events' Distribution (Past 90 days)"
  ) +

I’ll let you play with the package to find out who the heavy hitters are and explore more about the Homebrew ecosystem.


Kick the tyres. File issues & PRs and a hearty “Welcome!” to the Homebrew ecosystem for Linux and Windows users. My hope is that the WSL availability will eventually make it easier to develop for Windows systems and avoid the “download the kinda sketchy compiled windows libraries from github on package install” practice we have today.

If you crank out some analytics using the packages don’t forget to blog about it and drop a link in the comments!

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)