That is when webshot an R package that helps R programmers take web screenshots programmatically with the help of phantomJS running in the backend.
|Take Screenshot from R|
What is PhantomJS?
PhantomJS is an optimal solution for the following:
- Headless website testing
- Screen Capture
- Page Automation
- Network Monitoring
Webshot : R Package
The webshot package allows users to take screenshots of web pages from R with the help of PhantomJS. It also can take screenshots of R Shiny App and R Markdown Documents (both static and interactive).
Install and Load Package
The stable version of webshot is available on CRAN hence can be installed using the below code:
Also, the latest development version of webshot is hosted on github and can be installed using the below code:
As we saw above, the R package webshot works with PhantomJS in the backend, hence it is essential to have PhantomJS installed on the local machine where webshot package is used. To assist with that, webshot itself has an easy function to get PhantomJS installed on your machine.
The above function automatically downloads PhantomJS from its website and installs it. Please note this is only a first time setup and once both webshot and PhantomJS are installed these above two steps can be skipped for using the package as mentioned in the below sections.
Now, webshot package is installed and setup and is ready to use. To start with let us take a PDF copy of a web page.
webshot package provides one simple function webshot() that takes a webpage url as its first argument and saves it in the given file name that is its second argument. It is important to note that the filename includes the file extensions like ‘.jpg’, ‘.png’, ‘.pdf’ based on which the output file is rendered. Below is the basic structure of how the function goes:
If no folder path is specified along with the filename, the file is downloaded in the current working directory which can be checked with getwd().
Now that we understood the basics of the webshot() function, It is time for us to begin with our cases – starting with downloading/converting a webpage as a PDFcopy.
Case #1: PDF Copy of WebPage
Let us assume, we would like to download Bill Gates’ notes on Best Books of 2017 as a PDF copy.
#loading the required library
#PDF copy of a web page / article
delay = 2)
The above code generates a PDF whose (partial) screenshot is below:
|Snapshot of PDF Copy|
Dissecting the above code, we can see that the webshot( ) function has got 3 arguments supplied with it.
- URL from which the screenshot has to be taken.
- Output Filename along with its file extensions.
- Time to wait before taking screenshot, in seconds. Sometimes a longer delay is needed for all assets to display properly.
Thus, a webpage can be converted/downloaded as a PDF programmatically in R.
Case #2: Webpage Screenshot (Viewport Size)
Now, I’d like to get an automation script running to get screenshot of a News website and probably send it to my inbox for me to see the headlines without going to the browser. Here we will see how to get a simple screenshot of livemint.com an Indian news website.
#Screenshot of Viewport
webshot(‘https://www.livemint.com/’,’livemint.png’, cliprect = ‘viewport’)
If cliprect is unspecified, the screenshot of the complete web page is taken (like in the above case). Since we are updated in only the latest news (which is usually on the top of the website), we use cliprect with the value ‘viewport‘ which clips only the viewport part of the browser, as below.
|Screenshot of Viewport of Browser|
Case #3: Multiple Selector Based Screenshots
All the while we have seen taking simple screenshots of the whole pages and we dealt with one screenshot and one file, but that is not what usually happens when you are dealing with automation or perform something programmatically. In most of the cases we end up performing more than one action, hence this case deals with taking multiple screenshots and saving multiple files. But instead of taking multiple screenshots of different urls (which is quite straightforward), we will screenshots of different sections of the same web page with different CSS selector and save them in respective files.
#Multiple Selector Based Screenshots
file = c(“organizations.png”,”contributions.png”),
selector = list(“div.border-top.py-3.clearfix”,”div.js-contribution-graph”))
Thus, we have seen how to use the R package webshot for taking screenshots programmatically in R. Hope, this post helps fuel your automation needs and helps your organisation improve its efficiency.