WebR WASM R Package Load/Library Benchmarking Rabbit Hole

[This article was first published on rud.is"In God we trust. All others must bring data"R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have a post coming on using base and {ggplot2} plots in VanillaJS WebR, but after posting some bits on social media regarding how slow {ggplot2} is to deal with, I had some “performance”-related inquiries, which led me down a rabbit hole that I’m, now, dragging y’all down into as well.

First, a preview of the aforementioned plot/graphics:

I encourage you to load both of them before continuuing to see why I was curious about package load times.

Getting A Package Into WebR: A Look At {ggplot2}

If we strip away all the cruft, this is the core way to install a package into WebR and make it available to a freshly minted WebR context:

import { WebR } from '/webr/webr.mjs';
globalThis.webR = new WebR({ WEBR_URL: "/webr/", SW_URL: "/w/bench/",});
await globalThis.webR.init();
await globalThis.webR.installPackages(['PACKAGE'])
await globalThis.webR.evalRVoid('library(PACKAGE)')

Let’s look at what happens in the browser during the call to installPackages() when PACKAGE is ggplot2:

Dependent libraries are sequentially loaded until we finally get to ggplot2 (foregoeing {} from now on). There are 28 packages for ggplot2 (including itself) and they have a really skewed package size distribution:

Min.   :   6K
1st Qu.: 108K
Median : 481K
Mean   : 950K
3rd Qu.: 1.2M
Max.   : 5.4M

The good thing is, though, that the browser will cache them (for some period of time) so they aren’t re-downloaded every time you need them. Because of this, we’re going to ignore download time from consideration since they’re all, as we’ll see, below, yanked form cache in single-digit milliseconds.

When you call library(PACKAGE) R code gets executed, and that takes time. On modern desktops with local R installs, you almost never notice the time passage for this. This is not the case for WebR:

The Matrix, mgcv, and farver packages grind things to a halt. You felt that if you hit up the example at the beginning of the post. Brutal. Painful. Terrible.

This got me curious about all the other packages that are available to WebR (93 as of the date on this post).

Approaching R Package Load/library Benchmarking In A Browser

Much like the skewed package file size distribution of presently available R WASM packages, the per-package dependency distribution is also pretty skewed:

Min.   :  1
1st Qu.:  1
Median :  1
Mean   :  2
3rd Qu.:  2
Max.   : 15

This is good! It means you’re mostly safe to have fun with WebR and do not have to focus on working around an initial slowdown. Still, this did not deter me from a time sink.

I had to figure out a way to individually test the install/library of each WASM R packed independently, in a fresh WebR context.

One obvious way is to make 93 HTML files and load them all by hand.

O_O

There had to be a better way, and I immediately turned to “iframes” as a solution.

While I could have scripted the creation of proper for HTML 93 iframes to be put into a page, that’s not a great idea for a number of reasons:

  • that’ll crash every modern browser: far too many child iframes, all with their own DOM contexts sounds horrible
  • 93 “simultaneous” WebR initializations would consume all browser resources and DoS the tab
  • the “simultaneous” loading would skew timing results, even when the package files are cached

The solution was to use dynamically created iframes. One potential “gotcha” for this could have been the modern browser security model. Thanks to some dangerous hardware-level weakness that were discovered and exploited a few years back, Chrome and other browsers shored up the safety contracts between iframes and parent pages. Not doing so could have allowed attackers to have some fun at your expense.

If you’ve been following along the past week or so, to get the best performance with WebR, you need to make sure certain HTTP headers are in place so the browser can trust what you’re doing enough to relax some restrictions. Dynamically created iframes have no “headers”, per-se, but the clever folks who make browser bits for a living came up with a way to handle this. We just need to mark the frame as credentialless and we’ll get good performance (please read the link to get more context).

So, we can run a slightly expanded version of the (way) above javascript code to get timer stats, but how do we collect them?

Well, the parent of the iframe can talk to the iframe and vice-versa via postMessage(), so all we need to do is have the iframe send data back to the parent when it is done. This is also a signal we can kill the child iframe, freeing up resources, and then move on to the next one.

An Unexpected Twist

It turns out that some WASM-ified R packages are busted. Specifically:

  • fs
  • Hmisc
  • latticeExtra
  • pkgLoad

Some functions in each of them are needed by one or more other packages, but — as you’ll see if you run the benchmark site — they fail to library() after installation.

This was a “gotcha” I just had to wrap a try/catch block around, and also pass back information about.

Putting It All Together

You can run your own benchmarks at this playground page. View-source on the page to see the code (there’s just index.html and style.css). You can also see it at the WebR Experiments repo.

When the page loads, it fetches the last produced copy of https://rud.is/data/webr-packages.json. This is a JSON file I’m generating every night that contains all the packages available in “WASM notCRAN”. It just steals PACKAGES.rds every day and serializes it to JSON. Feel free to use it (if you get a CORS error lemme know; you shouldn’t but it’s an odd year).

Controls and sample output for the benchmark site.

The first thing your eyes will likely be drawn to is: “✅ Context is cross-origin isolated!”. When I was debugging early on WebR performance issues, George (the Godfather of WebR) noted that we needed certain headers to get those aforementioned safety restrictions loosened up a bit. You can test the global crossOriginIsolated variable to see if you’ve setup the headers correctly and read more about it when you have time. While it’s not needed on that page, I left it in so I could write this paragraph.

You’ll also see a “download results?” checkbox that is by default un-checked. If checked, you’ll get a JSON file with all the results in the table that is dynamically constructed.

After you tap “Begin Benchmark”, you can go get a matcha and come back.

You’ll see the results in a table and a surprise Observable Plot histogram (the post’s featured image).

I disable the controls after the run since you really should close the tab and start a fresh one (not just a reload) to get a clean context.

If you use the site and download the JSON, you can hit up this Observable notebook and put the JSON in a fork of it. I would also not mind it if you could post your JSON to the WebR Experiments repo as an issue and include the browser and system config you were using at the time.

FIN

This was a fun distraction, and shows you can use most of the presently available WebR packages without concern.

Make sure to check back for those WebR graphics posts!

To leave a comment for the author, please follow the link and comment on their blog: rud.is"In God we trust. All others must bring data"R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)