Webscraping with R using a Raspberry Pi

December 7, 2016
By

(This article was first published on databait, and kindly contributed to R-bloggers)

Setting up the Raspberry Pi

After the basic setup, i.e.

  • bought a Raspberry Pi Starter Kit
  • flashed the SD Card with Raspbian
  • ran raspi-config
  • installed R with apt-get install R, which installed R 3.1.1

I started to install the R packages usually needed for my cron-job tasks (mostly webscraping). I ran into problems with the rvest package because several packages could not be installed. Maybe there is a more efficient way but I did the following steps:

Install packages for webscraping

To install xml and related R packages (rvest), I needed the libxml2 on the system although apt-get had it, so I manually installed it:

1
2
3
wget ftp://xmlsoft.org/libxml2/libxml2-2.9.2.tar.gz
tar -xzvf libxml2-2.9.2.tar.gz
cd libxml2-2.9.2/

I also needed python-dev to make libxml2 compile.

1
2
sudo apt-get update
sudo apt-get install python-dev

Then built libxml2:

1
2
./configure --prefix=/usr --disable-static --with-history && make
sudo make install

I also had problems with the curl Package. Installation suggested to install libcurl4-openssl-dev therefore:

1
sudo apt-get install libcurl4-openssl-dev

Last problem was the openssl package. Again, I followed the suggestions from the failed R-package installation and installed libssl-dev:

1
sudo apt-get install libssl-dev

After that, rvest installed nicely. However, it took quite a while for the Pi to install all dependencies.

Webscraping Example – A simple frost warning for my plants

A simple Task, my Raspberry Pi is doing for me is sending a frost warning to my email if at 6 pm the weather forecast for the night goes below 3 °C. For this I got an API Key at openweathermap.org. Mind, that openweathermap.org does not like frequent requests (less than 1 per 10 minutes). At the beginning I got blocked.

You can then request some JSON for your city ID using your APPID (API Key):

1
2
library(jsonlite)
wd_json <- fromJSON("http://api.openweathermap.org/data/2.5/forecast/city?id=CITY_ID_GOES_HERE&APPID=YOUR_API_KEY_GOES_HERE")

Then tidy and extract the values needed. Temperatures are in degrees kelvin so we need to convert to celsius. The date I transform to POSIX.

1
2
3
4
5
wd <- wd_json$list
wd$Datum <- as.character(as.POSIXct(wd$dt, origin="1970-01-01", tz="Europe/Berlin"))
wd$Celsius_min <- wd$main$temp_min-273.15
wd$Celsius_max <- wd$main$temp_max-273.15
wd$Celsius_mean <- wd$main$temp-273.15

Sending results via email

Now for the part sending a mail:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
library(sendmailR)
library(xtable)
wd <- wd[as.POSIXct(Sys.time()+86400)>wd$Datum,]
if(any(wd$Celsius_min < 3)) {
dispatch <- print(xtable(wd[wd$Celsius_min<3,c("Datum","Celsius_min","Celsius_mean","Celsius_max")]),type="html")
msg <- mime_part(paste0('
Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
HTML demo

Frostwarnung

',
dispatch,
'
'))
## Override content type.
msg[["headers"]][["Content-Type"]] <- "text/html"
from <- sprintf("", Sys.info()[4])
to <- ""
subject <- paste("Frostwarnung",date())
body <- list(msg)
sendmail(from, to, subject, body,control=list(smtpServer="ASPMX.L.GOOGLE.COM"))

Finally we have to tell the Raspberry Pi to schedule the script to run daily at early evening. Save the .R file and add it to your crontab:

1
crontab -e

The first time you use crontab you are asked to choose an editor. Easiest (at least for me) to use is nano.
Add the following line:

1
00 18 * * * Rscript ~/path_to_your/script.R

Which will add the script to your cronjobs scheduling it at 18:00 every day and month.

To leave a comment for the author, please follow the link and comment on their blog: databait.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)