Access the Internet Archive Advanced Search/Scrape API with wayback (+ a links to a new vignette & pkgdown site)

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The wayback🔗 package has had an update to more efficiently retrieve mementos and added support for working with the Internet Archive’s advanced search+scrape API.

Search/Scrape

The search/scrape interface lets you examine the IA collections and download what you are after (programmatically). The main function is ia_scrape() but you can also paginate through results with the helper functions provided.

To demonstrate, let’s peruse the IA NASA collection and then grab one of the images. First, we need to search the collection then choose a target URL to retrieve and finally download it. The identifier is the key element to ensure we can retrieve the information about a particular collection.

library(wayback)

nasa 

I just happened to know this would take me to an image. You can add the media type to the result (along with a host of other fields) to help with programmatic filtering.

The API is still not sealed in stone, so you’re encouraged to submit questions/suggestions.

FIN

The vignette is embedded below and frame-busted here. It covers a very helpful and practical use-case identified recently by an OP on StackOverflow.

There’s also a new pkgdown-gen’d site for the package.

Issues & PRs welcome at your community coding site of choice.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)