Blog Archives

Access the Internet Archive Advanced Search/Scrape API with wayback (+ a links to a new vignette & pkgdown site)

September 17, 2018
By
Access the Internet Archive Advanced Search/Scrape API with wayback (+ a links to a new vignette & pkgdown site)

The wayback🔗 package has had an update to more efficiently retrieve mementos and added support for working with the Internet Archive’s advanced search+scrape API. Search/Scrape The search/scrape interface lets you examine the IA collections and download what you are after (programmatically). The main function is ia_scrape() but you can also paginate through results with the... Continue reading →

Read more »

The Evolution of Data Literacy at the U.S. Department of Energy + Finding Power Grid Cyber Attacks in a Data Haystack

September 12, 2018
By
The Evolution of Data Literacy at the U.S. Department of Energy + Finding Power Grid Cyber Attacks in a Data Haystack

I was chatting with some cyber-mates at a recent event and the topic of cyber attacks on the U.S. power-grid came up (as it often does these days). The conversation was brief, but the topic made its way into active memory and resurfaced when I saw today’s Data Is Plural newsletter which noted that “Utility... Continue reading →

Read more »

Driving Drill Dynamically with Docker and Updating Storage Configurations On-the-fly with sergeant

September 9, 2018
By
Driving Drill Dynamically with Docker and Updating Storage Configurations On-the-fly with sergeant

The sergeant🔗 package has a minor update that adds REST API coverage for two “new” storage endpoints that make it possible to add, update and remove storage configurations on-the-fly without using the GUI or manually updating a config file. This is an especially handy feature when paired with Drill’s new, official Docker container since that... Continue reading →

Read more »

Simplifying World Tile Grid Creation with geom_wtg()

August 27, 2018
By
Simplifying World Tile Grid Creation with geom_wtg()

Nowadays (I’ve seen that word used so much in journal articles lately that I could not resist using it) I’m using world tile grids more frequently as the need arises to convey the state of exposure of various services at a global (country) scale. Given that necessity fosters invention it seemed that having a ggplot2... Continue reading →

Read more »

Friday #rstats twofer: Finding macOS 32-bit apps & Processing Data from System Commands

August 24, 2018
By
Friday #rstats twofer: Finding macOS 32-bit apps & Processing Data from System Commands

Apple has run the death bell on 32-bit macOS apps and, if you’re running a recent macOS version on your Mac (which you should so you can get security updates) you likely see this alert from time-to-time: If you’re like me, you click through that and keep working but later ponder just how many of... Continue reading →

Read more »

Introducing ‘gepetto’ — a Splash-like REST API to Headless Chrome

August 23, 2018
By
Introducing ‘gepetto’ — a Splash-like REST API to Headless Chrome

It’s been over a year since Headless Chrome was introduced and it has matured greatly over that time and has acquired a pretty large user base. The TLDR on it is that you can now use Chrome as you would any command-line interface (CLI) program and generate PDFs, images or render javascript-interpreted HTML by supplying... Continue reading →

Read more »

In-brief: splashr update + High Performance Scraping with splashr, furrr & TeamHG-Memex’s Aquarium

August 13, 2018
By
In-brief: splashr update + High Performance Scraping with splashr, furrr & TeamHG-Memex’s Aquarium

The development version of splashr now support authenticated connections to Splash API instances. Just specify user and pass on the initial splashr::splash() call to use your scraping setup a bit more safely. For those not familiar with splashr and/or Splash: the latter is a lightweight alternative to tools like Selenium and the former is an... Continue reading →

Read more »

Digging into mbox details: A tale of tm & reticulate

August 4, 2018
By
Digging into mbox details: A tale of tm & reticulate

✨ I had to processes a bunch of emails for a $DAYJOB task this week and my “default setting” is to use R for pretty much everything (this should come as no surprise). Treating mail as data is not an uncommon task and many R packages exist that can reach out and grab mail from... Continue reading →

Read more »

ggplot “Doodling” with HIBP Breaches

July 29, 2018
By

After reading this interesting analysis of “How Often Are Americans’ Accounts Breached?” by Gaurav Sood (which we need more of in cyber-land) I gave in to the impulse to do some gg-doodling with the “Have I Been Pwnd” JSON data he used. It’s just some basic data manipulation with some heavy ggplot2 styling customization, so... Continue reading →

Read more »

Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names

July 26, 2018
By
Two new Apache Drill UDFs for Processing UR[IL]s  and Internet Domain Names

Continuing the blog’s UDF theme of late, there are two new UDF kids in town: drill-url-tools🔗 for slicing & dicing URI/URLs (just going to use ‘URL’ from now on in the post) drill-domain-tools🔗 for slicing & dicing internet domain names (IDNs). Now, if you’re an Apache Drill fanatic, you’re likely thinking “Hey hrbrmstr: don’t you... Continue reading →

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)