An introduction to web scraping: locating Spanish schools

[This article was first published on R on Coding Club UC3M, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Jorge Cimentada


Whenever a new paper is released using some type of scraped data, most of my peers in the social science community get baffled at how researchers can do this. In fact, many social scientists can’t even think of research questions that can be addressed with this type of data simply because they don’t know it’s even possible. As the old saying goes, when you have a hammer, every problem looks like a nail.

With the increasing amount of data being collected on a daily basis, it is eminent that scientists start getting familiar with new technologies that can help answer old questions. Moreover, we need to be adventurous about cutting edge data sources as they can also allow us to ask new questions which weren’t even thought of in the past.

In this tutorial I’ll be guiding you through the basics of web scraping using R and the xml2 package. I’ll begin with a simple example using fake data and elaborate further by trying to scrape the location of a sample of schools in Spain.

Basic steps

For web scraping in R, you can fulfill almost all of your needs with the xml2 package. As you wander through the web, you’ll see many examples using the rvest package. xml2 and rvest are very similar so don’t feel you’re lacking behind for learning one and not the other. In addition to these two packages, we’ll need some other libraries for plotting locations on a map (ggplot2, sf, rnaturalearth), identifying who we are when we scrape (httr) and wrangling data (tidyverse).

Additionally, we’ll also need the package scrapex. In the real-world example that we’ll be doing below, we’ll be scraping data from the website to locate a sample of schools in Spain. However, throughout the tutorial we won’t be scraping the data directly from their real-website. What would happen to this tutorial if 6 months from now updates the design of their website? Everything from our real-world example would be lost.

Web scraping tutorials are usually very unstable precisely because of this. To circumvent that problem, I’ve saved a random sample of websites from some schools in into an R package called scrapex. Although the links we’ll be working on will be hosted locally on your machine, the HTML of the website should be very similar to the one hosted on the website (with the exception of some images/icons which were deleted on purpose to make the package lightweight).

You can install the package with:

# install.packages("devtools")

Now, let’s move on the fake data example and load all of our packages with:


Let’s begin with a simple example. Below we define an XML string and look at its structure:

xml_test <- "




##         Jason
##         Bourne
##       Spy
##         Carol
##         Kalp
##       Scientist

In XML and HTML the basic building blocks are something called tags. For example, the first tag in the structure shown above is . This tag is matched by at the end of the string:

If you pay close attention, you’ll see that each tag in the XML structure has a beginning (signaled by <>) and an end (signaled by ). For example, the next tag after is and right before the tag is the end of the jason tag .

Similarly, you’ll find that the tag is also matched by a finishing tag.

In theory, tags can have whatever meaning you attach to them (such as or ). However, in practice there are hundreds of tags which are standard in websites (for example, here). If you’re just getting started, there’s no need for you to learn them but as you progress in web scraping, you’ll start to recognize them (one brief example is which simply bolds text in a website).

The xml2 package was designed to read XML strings and to navigate the tree structure to extract information. For example, let’s read in the XML data from our fake example and look at its general structure:

xml_raw <- read_xml(xml_test)
##           {text}
##         {text}
##         {text}
##           {text}
##         {text}
##         {text}

You can see that the structure is tree-based, meaning that tags such as and are nested within the tag. In XML jargon, is the root node, whereas and are the child nodes from .

In more detail, the structure is as follows:

  • The root node is
  • The child nodes are and
  • Then each child node has nodes , , and nested within them.

Put another way, if something is nested within a node, then the nested node is a child of the upper-level node. In our example, the root node is so we can check which are its children:

# xml_child returns only one child (specified in search)
# Here, jason is the first child
xml_child(xml_raw, search = 1)
## {xml_node}
## [1] \n  \n    \n        Ja ...
# Here, carol is the second child
xml_child(xml_raw, search = 2)
## {xml_node}
## [1] \n  \n    \n        Carol\n ...
# Use xml_children to extract **all** children
child_xml <- xml_children(xml_raw)

## {xml_nodeset (2)}
## [1] \n  \n    \n      \n  \n    \n      \n ...

Tags can also have different attributes which are usually specified as and ended as usual with . If you look at the XML structure of our example, you’ll notice that each tag has an attribute called type. As you’ll see in our real-world example, extracting these attributes is often the aim of our scraping adventure. Using xml2, we can extract all attributes that match a specific name with xml_attrs.

# Extract the attribute type from all nodes
xml_attrs(child_xml, "type")
## [[1]]
## named character(0)
## [[2]]
## named character(0)

Wait, why didn’t this work? Well, if you look at the output of child_xml, we have two nodes on which are for and .

## {xml_nodeset (2)}
## [1] \n  \n    \n      \n  \n    \n      \n ...

Do these tags have an attribute? No, because if they did, they would have something like . What we need is to look down at the tag within and and extract the attribute from .

Does this sound familiar? Both and have an associated tag below them, making them their children. We can just go down one level by running xml_children on these tags and extract them.

# We go down one level of children
person_nodes <- xml_children(child_xml)

#  is now the main node, so we can extract attributes
## {xml_nodeset (2)}
## [1] \n  \n    \n        Ja ...
## [2] \n  \n    \n        Carol\n ...
# Both type attributes
xml_attrs(person_nodes, "type")
## [[1]]
##        type
## "fictional"
## [[2]]
##   type
## "real"

Using the xml_path function you can even find the ‘address’ of these nodes to retrieve specific tags without having to write down xml_children many times. For example:

# Specific address of each person tag for the whole xml tree
# only using the `person_nodes`
## [1] "/people/jason/person" "/people/carol/person"

We have the ‘address’ of specific tags in the tree but how do we extract them automatically? To extract specific ‘addresses’ of this XML tree, the main function we’ll use is xml_find_all. This function accepts the XML tree and an ‘address’ string. We can use very simple strings, such as the one given by xml_path:

# You can use results from xml_path like directories
xml_find_all(xml_raw, "/people/jason/person")
## {xml_nodeset (1)}
## [1] \n  \n    \n        Ja ...

The expression above is asking for the node "/people/jason/person". This will return the same as saying xml_raw %>% xml_child(search = 1). For deeply nested trees, xml_find_all will be many times much cleaner than calling xml_child recursively many times.

However, in most cases the ‘addresses’ used in xml_find_all come from a separate language called XPath (in fact, the ‘address’ we’ve been looking at is XPath). XPath is a complex language (such as regular expressions for strings) which is beyond this brief tutorial. However, with the examples we’ve seen so far, we can use some basic XPath which we’ll need later on.

To extract all the tags in a document, we can use //name_of_tag.

# Search for all 'married' nodes
xml_find_all(xml_raw, "//married")
## {xml_nodeset (2)}
## [1] \n        Jason\n      
## [2] \n        Carol\n      

With the previous XPath, we’re searching for all married tags within the complete XML tree. The result returns all married nodes (I use the words tags and nodes interchangeably) in the complete tree structure. Another example would be finding all tags:

xml_find_all(xml_raw, "//occupation")
## {xml_nodeset (2)}
## [1] \n      Spy\n    
## [2] \n      Scientist\n    

If you want to find any other tag you can replace "//occupation" with your tag of interest and xml_find_all will find all of them.

If you wanted to find all tags below your current node, you only need to add a . at the beginning: ".//occupation". For example, if we dived into the tag and we wanted his tag, "//occupation" will returns all tags. Instead, ".//occupation" will return only the found tags below the current tag. For example:

xml_raw %>%
  # Dive only into Jason's tag
  xml_child(search = 1) %>%
## {xml_nodeset (1)}
## [1] \n      Spy\n    
# Instead, the wrong way would have been:
xml_raw %>%
  # Dive only into Jason's tag
  xml_child(search = 1) %>%
  # Here we get both occupation tags
## {xml_nodeset (2)}
## [1] \n      Spy\n    
## [2] \n      Scientist\n    

The first example only returns ’s occupation whereas the second returned all occupations, regardless of where you are in the tree.

XPath also allows you to identify tags that contain only one specific attribute, such as the one’s we saw earlier. For example, to filter all tags with the attribute filter set to fictional, we could do it with:

# Give me all the tags 'person' that have an attribute type='fictional'
xml_raw %>%
## {xml_nodeset (1)}
## [1] \n  \n    \n        Ja ...

If you wanted to do the same but for the tags below your current nodes, the same trick we learned earlier would work: ".//person[@type='fictional']". These are just some primers that can help you jump easily to using XPath, but I encourage you to look at other examples on the web, as complex websites often require complex XPath expressions.

Before we begin our real-word example, you might be asking yourself how you can actually extract the text/numeric data from these nodes. Well, that’s easy: xml_text.

xml_raw %>%
  xml_find_all(".//occupation") %>%
## [1] "\n      Spy\n    "       "\n      Scientist\n    "

Once you’ve narrowed down your tree-based search to one single piece of text or numbers, xml_text() will extract that for you (there’s also xml_double and xml_integer for extracting numbers). As I said, XPath is really a huge language. If you’re interested, this XPath cheat sheets have helped me a lot to learn tricks for easy scraping.

Real-world example

We’re interested in making a list of many schools in Spain and visualizing their location. This can be useful for many things such as matching population density of children across different regions to school locations. The website contains a database of schools similar to what we’re looking for. As described at the beginning, instead we’re going to use scrapex which has the function spanish_schools_ex() containing the links to a sample of websites from different schools saved locally on your computer.

Let’s look at an example for one school.

school_links <- spanish_schools_ex()

# Keep only the HTML file of one particular school.
school_url <- school_links[13]

## [1] "/usr/local/lib/R/site-library/scrapex/extdata/spanish_schools_ex/school_3006839.html"

If you’re interested in looking at the website interactively in your browser, you can do it with browseURL(prep_browser(school_url)). Let’s read the HTML (XML and HTML are usually interchangeable, so here we use read_html).

# Here we use `read_html` because `read_xml` is throwing an error
# when attempting to read. However, everything we've discussed
# should be the same.
school_raw <- read_html(school_url) %>% xml_child()

## {html_node}
##  [1] Aquí encontrarás toda la información necesaria sobre CEIP SA ...
##  [2] <meta charset="utf-8">\n
##  [3] <meta name="viewport" content="width=device-width, initial-scale=1, ...
##  [4] <meta http-equiv="x-ua-compatible" content="ie=edge">\n
##  [5] <meta name="author" content="BuscoColegio">\n
##  [6] <meta name="description" content="Encuentra toda la información nec ...
##  [7] <meta name="keywords" content="opiniones SANCHIS GUARNER, contacto  ...
##  [8] <link rel="shortcut icon" href="/favicon.ico">\n
##  [9] <link rel="stylesheet" href="// ...
## [10] <link rel="stylesheet" href=" ...
## [11] <link rel="stylesheet" href="/assets/vendor/icon-awesome/css/font-a ...
## [12] <link rel="stylesheet" href="/assets/vendor/icon-line/css/simple-li ...
## [13] <link rel="stylesheet" href="/assets/vendor/icon-line-pro/style.css ...
## [14] <link rel="stylesheet" href="/assets/vendor/icon-hs/style.css">\n
## [15] <link rel="stylesheet" href=" ...
## [16] <link rel="stylesheet" href=" ...
## [17] <link rel="stylesheet" href=" ...
## [18] <link rel="stylesheet" href=" ...
## [19] <link rel="stylesheet" href=" ...
## [20] <link rel="stylesheet" href=" ...
## ...</code></pre><p>Web scraping strategies are very specific to the website you’re after. You have to get very familiar with the website you’re interested to be able to match perfectly the information you’re looking for. In many cases, scraping two websites will require vastly different strategies. For this particular example, we’re only interested in figuring out the <strong>location</strong> of each school so we only have to extract its location.</p><p><img src="" style="display: block; margin: auto;" data-recalc-dims="1" /></p><p></p><p>In the image above you’ll find a typical school’s website in <code></code>. The website has a lot of information, but we’re only interested in the button that is circled by the orange rectangle. If you can’t find it easily, it’s below the Google Maps on the right which says “Buscar colegio cercano”.</p><p>When you click on this button, this actually points you towards the coordinates of the school so we just have to find a way of figuring out how to click this button or figure out how to get its information. All browsers allow you to do this if you press CTRL + SHIFT + c at the same time (Firefox and Chrome support this hotkey). If a window on the right popped in full of code, then you’re on the right track:</p><p></p><p><img src="" style="display: block; margin: auto;" data-recalc-dims="1" /></p><p></p><p>Here we can search the source code of the website. If you place your mouse pointer over the lines of code from this right-most window, you’ll see sections of the website being highlighted in blue. This indicates which parts of the code refer to which parts of the website. Luckily for us, we don’t have to search the complete source code to find that specific location. We can approximate our search by typing the text we’re looking for in the search bar at the top of the right window:</p><p></p><p><img src="" style="display: block; margin: auto;" data-recalc-dims="1" /></p><p></p><p>After we click enter, we’ll be automatically directed to the tag that has the information that we want.</p><p></p><p><img src="" style="display: block; margin: auto;" data-recalc-dims="1" /></p><p></p><p>More specifically, we can see that the latitude and longitude of schools are found in an attributed called <code>href</code> in a tag <code><a></code>:</p><p></p><p><img src="" style="display: block; margin: auto;" data-recalc-dims="1" /></p><p></p><p>Can you see the latitude and longitude fields in the text highlighted blue? It’s hidden in-between words. That is precisely the type of information we’re after. Extracting all <code><a></code> tags from the website (hint: XPath similar to <code>"//a"</code>) will yield hundreds of matches because <code><a></code> is a very common tag. Moreover, refining the search to <code><a></code> tags which have an <code>href</code> attribute will also yield hundreds of matches because <code>href</code> is the standard attribute to attach links within websites. We need to narrow down our search within the website.</p><p>One strategy is to find the ‘father’ or ‘grandfather’ node of this particular <code><a></code> tag and then match a node which has that same sequence of grandfather -> father -> child node. By looking at the structure of this small XML snippet from the right-most window, we see that the ‘grandfather’ of this <code><a></code> tag is <code><p class="d-flex align-items-baseline g-mt-5'></code> which has a particularly long attribute named <code>class</code>.</p><p></p><p><img src="" style="display: block; margin: auto;" data-recalc-dims="1" /></p><p></p><p>Don’t be intimidated by these tag names and long attributes. I also don’t know what any of these attributes mean. But what I do know is that this is the ‘grandfather’ of the <code><a></code> tag I’m interested in. So using our XPath skills, let’s search for that <code><p></code> tag and see if we get only one match.</p><pre class="r"><code># Search for all <p> tags with that class in the document
school_raw %>%
  xml_find_all("//p[@class='d-flex align-items-baseline g-mt-5']")</code></pre><pre><code>## {xml_nodeset (1)}
## [1] <p class="d-flex align-items-baseline g-mt-5">\r\n\t                 ...</code></pre><p>Only one match, so this is good news. This means that we can uniquely identify this particular <code><p></code> tag. Let’s refine the search to say: Find all <code><a></code> tags which are children of that specific <code><p></code> tag. This only means I’ll add a <code>"//a"</code> to the previous expression. Since there is only one <code><p></code> tag with the class, we’re interested in checking whether there is more than one <code><a></code> tag below this <code><p></code> tag.</p><pre class="r"><code>school_raw %>%
  xml_find_all("//p[@class='d-flex align-items-baseline g-mt-5']//a")</code></pre><pre><code>## {xml_nodeset (1)}
## [1] <a href="/Colegio/buscar-colegios-cercanos.action?colegio.latitud=38 rel=" target="_blank"></pre><p>There we go! We can see the specific <code>href</code> that contains the latitude and longitude data we’re interested in. How do we extract the <code>href</code> attribute? Using <code>xml_attr</code> as we did before!</p><pre class="r"><code>location_str <-
  school_raw %>%
  xml_find_all("//p[@class='d-flex align-items-baseline g-mt-5']//a") %>%
  xml_attr(attr = "href")

location_str</code></pre><pre><code>## [1] "/Colegio/buscar-colegios-cercanos.action?colegio.latitud=38.8274492&colegio.longitud=0.0221681"</code></pre><p>Ok, now we need some regex skills to get only the latitude and longitude (regex expressions are used to search for patterns inside a string, such as for example a date. See <a href="" rel="nofollow" target="_blank">here</a> for some examples):</p><pre class="r"><code>location <-
  location_str %>%
  str_extract_all("=.+$") %>%
  str_replace_all("=|colegio\\.longitud", "") %>%
  str_split("&") %>%

location</code></pre><pre><code>## [1] "38.8274492" "0.0221681"</code></pre><p>Ok, so we got the information we needed for one single school. Let’s turn that into a function so we can pass only the school’s link and get the coordinates back.</p><p>Before we do that, I will set something called my <code>User-Agent</code>. In short, the <code>User-Agent</code> is <strong>who</strong> you are. It is good practice to identify the person who is scraping the website because if you’re causing any trouble on the website, the website can directly identify who is causing problems. You can figure out your user agent <a href="" rel="nofollow" target="_blank">here</a> and paste it in the string below. In addition, I will add a time sleep of 5 seconds to the function because we want to make sure we don’t cause any troubles to the website we’re scraping due to an overload of requests.</p><pre class="r"><code># This sets your `User-Agent` globally so that all requests are
# identified with this `User-Agent`
  user_agent("Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0")

# Collapse all of the code from above into one function called
# school grabber

school_grabber <- function(school_url) {
  # We add a time sleep of 5 seconds to avoid
  # sending too many quick requests to the website

  school_raw <- read_html(school_url) %>% xml_child()

  location_str <-
    school_raw %>%
    xml_find_all("//p[@class='d-flex align-items-baseline g-mt-5']//a") %>%
    xml_attr(attr = "href")

  location <-
    location_str %>%
    str_extract_all("=.+$") %>%
    str_replace_all("=|colegio\\.longitud", "") %>%
    str_split("&") %>%

  # Turn into a data frame
    latitude = location[1],
    longitude = location[2],
    stringsAsFactors = FALSE

school_grabber(school_url)</code></pre><pre><code>##     latitude longitude
## 1 38.8274492 0.0221681</code></pre><p>Ok, so it’s working. The only thing left is to extract this for many schools. As shown earlier, <code>scrapex</code> contains a list of 27 school links that we can automatically scrape. Let’s loop over those, get the information of coordinates for each and collapse all of them into a data frame.</p><pre class="r"><code>res <- map_dfr(school_links, school_grabber)
res</code></pre><pre><code>##    latitude  longitude
## 1  42.72779 -8.6567935
## 2  43.24439 -8.8921645
## 3  38.95592 -1.2255769
## 4  39.18657 -1.6225903
## 5  40.38245 -3.6410388
## 6  40.22929 -3.1106322
## 7  40.43860 -3.6970366
## 8  40.33514 -3.5155669
## 9  40.50546 -3.3738441
## 10 40.63826 -3.4537107
## 11 40.38543 -3.6639500
## 12 37.76485 -1.5030467
## 13 38.82745  0.0221681
## 14 40.99434 -5.6224391
## 15 40.99434 -5.6224391
## 16 40.56037 -5.6703725
## 17 40.99434 -5.6224391
## 18 40.99434 -5.6224391
## 19 41.13593  0.9901905
## 20 41.26155  1.1670507
## 21 41.22851  0.5461471
## 22 41.14580  0.8199749
## 23 41.18341  0.5680564
## 24 42.07820  1.8203155
## 25 42.25245  1.8621546
## 26 41.73767  1.8383666
## 27 41.62345  2.0013628</code></pre><p>So now that we have the locations of these schools, let’s plot them:</p><pre class="r"><code>res <- mutate_all(res, as.numeric)

sp_sf <-
  ne_countries(scale = "large", country = "Spain", returnclass = "sf") %>%
  st_transform(crs = 4326)

ggplot(sp_sf) +
  geom_sf() +
  geom_point(data = res, aes(x = longitude, y = latitude)) +
  coord_sf(xlim = c(-20, 10), ylim = c(25, 45)) +
  theme_minimal() +
  ggtitle("Sample of schools in Spain")</code></pre><p><img src="" style="display: block; margin: auto;" data-recalc-dims="1" /></p><p>There we go! We went from literally no information at the beginning of this tutorial to interpretable and summarized information only using web data. We can see some schools in Madrid (center) as well in other regions of Spain, including Catalonia and Galicia.</p><p>This marks the end of our scraping adventure but before we finish, I want to mention some of the ethical guidelines for web scraping. Scraping is extremely useful for us but can give headaches to other people maintaining the website of interest. Here’s a list of ethical guidelines you should always follow:</p><ul><li><p>Read the terms and services: many websites prohibit web scraping and you could be in a breach of privacy by scraping the data. <a href="" rel="nofollow" target="_blank">One</a> famous example.</p></li><li><p>Check the <code>robots.txt</code> file. This is a file that most websites have (<code></code> does <strong>not</strong>) which tell you which specific paths inside the website are scrapable and which are not. See <a href="" rel="nofollow" target="_blank">here</a> for an explanation of what robots.txt look like and where to find them.</p></li><li><p>Some websites are supported by very big servers, which means you can send 4-5 website requests per second. Others, such as <code></code> are not. It’s good practice to always put a time sleep between your requests. In our example, I set it to 5 seconds because this is a small website and we don’t want to crash their servers.</p></li><li><p>When making requests, there are computational ways of identifying yourself. For example, every request (such as the one’s we do) can have something called a <code>User-Agent</code>. It is good practice to include yourself in as the <code>User-Agent</code> (as we did in our code) because the admin of the server can directly identify if someone’s causing problems due to their web scraping.</p></li><li><p>Limit your scraping to non-busy hours such as overnight. This can help reduce the chances of collapsing the website since fewer people are visiting websites in the evening.</p></li></ul><p>You can read more about these ethical issues <a href="" rel="nofollow" target="_blank">here</a>.</p></div><div id="wrap-up" class="section level2"><h2>Wrap up</h2><p>This tutorial introduced you to basic concepts in web scraping and applied them in a real-world setting. Web scraping is a vast field in computer science (you can find entire books on the subject such as <a href="" rel="nofollow" target="_blank">this</a>). We covered some basic techniques which I think can take you a long way but there’s definitely more to learn. For those curious about where to turn, I’m looking forward to the upcoming book <a href="" rel="nofollow" target="_blank">“A Field Guide for Web Scraping and Accessing APIs with R”</a> by Bob Rudis, which should be released in the near future. Now go scrape some websites ethically!</p></div> <script type='text/javascript'>var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' };

		  (function(d, t) {
			var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;
			s.src = '//';
			var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r);
		  }(document, 'script'));</script> <div id='jp-relatedposts' class='jp-relatedposts' ><h3 class="jp-relatedposts-headline"><em>Related</em></h3></div><aside class="mashsb-container mashsb-main mashsb-stretched"><div class="mashsb-box"><div class="mashsb-buttons"><a class="mashicon-facebook mash-small mash-center mashsb-noshadow" href="" target="_blank" rel="nofollow"><span class="icon"></span><span class="text">Share</span></a><a class="mashicon-twitter mash-small mash-center mashsb-noshadow" href="" target="_blank" rel="nofollow"><span class="icon"></span><span class="text">Tweet</span></a><div class="onoffswitch2 mash-small mashsb-noshadow" style="display:none;"></div></div></div><div style="clear:both;"></div></aside><p class="syndicated-attribution"><div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;"><div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href=""> R on Coding Club UC3M</a></strong>.</div><hr /> <a href="" rel="nofollow"></a> offers <strong><a href="" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.<hr>Want to share your content on R-bloggers?<a href="" rel="nofollow"> click here</a> if you have a blog, or <a href="" rel="nofollow"> here</a> if you don't.</div></p></div></article><nav class="post-navigation clearfix" role="navigation"><div class="post-nav left"> <a href="" rel="prev">← Previous post</a></div><div class="post-nav right"> <a href="" rel="next">Next post →</a></div></nav></div><aside class="mh-sidebar sb-right"><div id="custom_html-2" class="widget_text sb-widget widget_custom_html"><div class="textwidget custom-html-widget"><div class="top-search" style="padding-left: 0px;"><form id="searchform" action="" target="_blank"><div> <input type="hidden" name="cx" value="005359090438081006639:paz69t-s8ua" /> <input type="hidden" name="ie" value="UTF-8" /> <input type="text" value="" name="q" id="q" autocomplete="on" style="font-size:16px;" placeholder="Search R-bloggers.." /> <input type="submit" id="searchsubmit2" name="sa" value="Go" style="font-size:16px;" /></div></form></div></div></div><div id="text-6" class="sb-widget widget_text"><div class="textwidget"><div><form style="border:1px solid #ccc;padding:3px;text-align:center; background: none repeat scroll 0 0 #FDEADA;" action="" method="post" target="popupwindow" onsubmit="'', 'popupwindow', 'scrollbars=yes,width=550,height=520');return true"> <input type="text" style="width:110px"  onclick="if (this.value == 'Your e-mail here') this.value = '';" value='Your e-mail here' name="email"/> <input type="hidden" value="RBloggers" name="uri"/><input type="hidden" name="loc" value="en_US"/><input type="submit" value="Subscribe" /> <a href=""><img src="" height="17" width="80" style="border:0;margin-bottom: 5px;" alt="" data-recalc-dims="1" /></a></form></div> <br/><div> <script>function init() {
var vidDefer = document.getElementsByTagName('iframe');
for (var i=0; i<vidDefer.length; i++) {
if(vidDefer[i].getAttribute('data-src')) {
} } }
window.onload = init;</script> <iframe allowtransparency="true" frameborder="0" scrolling="no"
src="" data-src="//"
 style="width:100%; height:30px;"></iframe><div id="fb-root"></div> <script async defer crossorigin="anonymous" src="" nonce="RysU23SE"></script> <div style="min-height: 154px;" class="fb-page" data-href="" data-tabs="" data-width="300" data-height="154" data-small-header="true" data-adapt-container-width="true" data-hide-cover="false" data-show-facepile="true"><blockquote cite="" class="fb-xfbml-parse-ignore"><a href="">R bloggers</a></blockquote></div></div></div></div><div id="wppp-3" class="sb-widget widget_wppp"><h4 class="widget-title">Most viewed posts (weekly)</h4><ul class='wppp_list'><li><a href='' title='How to become a data scientist in 30 days?'>How to become a data scientist in 30 days?</a></li><li><a href='' title='5 Ways to Subset a Data Frame in R'>5 Ways to Subset a Data Frame in R</a></li><li><a href='' title='Introducing Modeltime: Tidy Time Series Forecasting using Tidymodels'>Introducing Modeltime: Tidy Time Series Forecasting using Tidymodels</a></li><li><a href='' title='Time Series Analysis: Forecasting Sales Data with Autoregressive (AR) Models'>Time Series Analysis: Forecasting Sales Data with Autoregressive (AR) Models</a></li><li><a href='' title='Visualizing Principle Components for Images'>Visualizing Principle Components for Images</a></li></ul></div><div id="text-18" class="sb-widget widget_text"><h4 class="widget-title">Sponsors</h4><div class="textwidget"><div style="min-height: 2055px;"> <script data-cfasync="false" type="text/javascript">//
// this must be placed higher. Otherwise it doesn't work.
// data-cfasync="false" is for making sure cloudflares' rocketcache doesn't interfeare with this
// in this case it only works because it was used at the original script in the text widget

function createCookie(name,value,days) {
    var expires = "";
    if (days) {
        var date = new Date();
        date.setTime(date.getTime() + (days*24*60*60*1000));
        expires = "; expires=" + date.toUTCString();
    document.cookie = name + "=" + value + expires + "; path=/";

function readCookie(name) {
    var nameEQ = name + "=";
    var ca = document.cookie.split(';');
    for(var i=0;i < ca.length;i++) {
        var c = ca[i];
        while (c.charAt(0)==' ') c = c.substring(1,c.length);
        if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length,c.length);
    return null;

function eraseCookie(name) {

// no longer use async because of google
// async 
async function readTextFile(file)
	// Helps people browse between pages without the need to keep downloading the same 
	// ads txt page everytime. This way, it allows them to use their browser's cache.
	var random_number = readCookie("ad_random_number_cookie");
	if(random_number == null) {
		var random_number = Math.floor(Math.random()*100*(new Date().getTime()/10000000000));
    file += '?t='+random_number;
    var rawFile = new XMLHttpRequest();
    rawFile.onreadystatechange = function ()
        if(rawFile.readyState === 4)
            if(rawFile.status === 200 || rawFile.status == 0)
                // var allText = rawFile.responseText;
                // document.write(allText);
    }"GET", file, false);

// readTextFile('');

readTextFile("");</script> </div></div></div><div id="recent-posts-3" class="sb-widget widget_recent_entries"><h4 class="widget-title">Recent Posts</h4><ul><li> <a href="">Sort data frames by columns</a></li><li> <a href="">poorman: Version 0.2.1 Release</a></li><li> <a href="">word2vec in R</a></li><li> <a href="">The pelotonR Package Debut!</a></li><li> <a href="">Tip (4), Variable Explorer for both R and Python in RStudio</a></li><li> <a href="">Puzzling Regression Anatomy</a></li><li> <a href="">Using Pyomo from R through the magic of Reticulate</a></li><li> <a href="">Announcing Public Package Manager and v1.1.6</a></li><li> <a href="">Open-Source Authorship of Data Science in Education Using R</a></li><li> <a href="">Estimating Runtime for an R script</a></li><li> <a href="">Le Monde puzzle [#1149]</a></li><li> <a href="">beta: Evidence-based Software Engineering – book</a></li><li> <a href="">Impressions from e-Rum2020</a></li><li> <a href="">Time Series Analysis: Forecasting Sales Data with Autoregressive (AR) Models</a></li><li> <a href="">The Bechdel test and the X-Mansion with tidymodels and #TidyTuesday</a></li></ul></div><div id="rss-7" class="sb-widget widget_rss"><h4 class="widget-title"><a class="rsswidget" href=""><img class="rss-widget-icon" style="border:0" width="14" height="14" src="" alt="RSS" /></a> <a class="rsswidget" href="">Jobs for R-users</a></h4><ul><li><a class='rsswidget' href=''>Data Analytics Manager</a></li><li><a class='rsswidget' href=''>Data Analytics Auditor, Future of Audit Lead @ London or Newcastle</a></li><li><a class='rsswidget' href=''>Senior Scientist, Translational Informatics @ Vancouver, BC, Canada</a></li><li><a class='rsswidget' href=''>Senior Principal Data Scientist @ Mountain View, California, United States</a></li><li><a class='rsswidget' href=''>Technical Research Analyst – New York, U.S.</a></li></ul></div><div id="rss-9" class="sb-widget widget_rss"><h4 class="widget-title"><a class="rsswidget" href=""><img class="rss-widget-icon" style="border:0" width="14" height="14" src="" alt="RSS" /></a> <a class="rsswidget" href=""> (python/data-science news)</a></h4><ul><li><a class='rsswidget' href=''>100 Python pandas tips and tricks</a></li><li><a class='rsswidget' href=''>How to become a data scientist in 30 days?</a></li><li><a class='rsswidget' href=''>Performance anxiety</a></li><li><a class='rsswidget' href=''>Python Musings #1: Reading raw input from Hackkerank Challenges</a></li><li><a class='rsswidget' href=''>Data Science Application in Manufacturing</a></li><li><a class='rsswidget' href=''>Parallel AdaOpt classification on MNIST handwritten digits (without preprocessing)</a></li><li><a class='rsswidget' href=''>Portfolio simulations</a></li></ul></div><div id="text-16" class="sb-widget widget_text"><div class="textwidget"><strong><a href="">Full list of contributing R-bloggers</a></strong></div></div><div id="archives-3" class="sb-widget widget_archive"><h4 class="widget-title">Archives</h4> <label class="screen-reader-text" for="archives-dropdown-3">Archives</label> <select id="archives-dropdown-3" name="archive-dropdown"><option value="">Select Month</option><option value=''> July 2020  (6)</option><option value=''> June 2020  (188)</option><option value=''> May 2020  (274)</option><option value=''> April 2020  (281)</option><option value=''> March 2020  (237)</option><option value=''> February 2020  (206)</option><option value=''> January 2020  (208)</option><option value=''> December 2019  (207)</option><option value=''> November 2019  (188)</option><option value=''> October 2019  (213)</option><option value=''> September 2019  (209)</option><option value=''> August 2019  (254)</option><option value=''> July 2019  (227)</option><option value=''> June 2019  (213)</option><option value=''> May 2019  (245)</option><option value=''> April 2019  (270)</option><option value=''> March 2019  (290)</option><option value=''> February 2019  (248)</option><option value=''> January 2019  (274)</option><option value=''> December 2018  (249)</option><option value=''> November 2018  (284)</option><option value=''> October 2018  (307)</option><option value=''> September 2018  (282)</option><option value=''> August 2018  (269)</option><option value=''> July 2018  (330)</option><option value=''> June 2018  (297)</option><option value=''> May 2018  (317)</option><option value=''> April 2018  (296)</option><option value=''> March 2018  (284)</option><option value=''> February 2018  (239)</option><option value=''> January 2018  (322)</option><option value=''> December 2017  (250)</option><option value=''> November 2017  (265)</option><option value=''> October 2017  (284)</option><option value=''> September 2017  (285)</option><option value=''> August 2017  (334)</option><option value=''> July 2017  (278)</option><option value=''> June 2017  (312)</option><option value=''> May 2017  (344)</option><option value=''> April 2017  (321)</option><option value=''> March 2017  (363)</option><option value=''> February 2017  (313)</option><option value=''> January 2017  (364)</option><option value=''> December 2016  (341)</option><option value=''> November 2016  (289)</option><option value=''> October 2016  (306)</option><option value=''> September 2016  (254)</option><option value=''> August 2016  (287)</option><option value=''> July 2016  (326)</option><option value=''> June 2016  (263)</option><option value=''> May 2016  (292)</option><option value=''> April 2016  (260)</option><option value=''> March 2016  (302)</option><option value=''> February 2016  (268)</option><option value=''> January 2016  (337)</option><option value=''> December 2015  (304)</option><option value=''> November 2015  (234)</option><option value=''> October 2015  (259)</option><option value=''> September 2015  (238)</option><option value=''> August 2015  (264)</option><option value=''> July 2015  (243)</option><option value=''> June 2015  (213)</option><option value=''> May 2015  (235)</option><option value=''> April 2015  (211)</option><option value=''> March 2015  (259)</option><option value=''> February 2015  (212)</option><option value=''> January 2015  (245)</option><option value=''> December 2014  (237)</option><option value=''> November 2014  (221)</option><option value=''> October 2014  (218)</option><option value=''> September 2014  (259)</option><option value=''> August 2014  (217)</option><option value=''> July 2014  (235)</option><option value=''> June 2014  (241)</option><option value=''> May 2014  (243)</option><option value=''> April 2014  (260)</option><option value=''> March 2014  (289)</option><option value=''> February 2014  (269)</option><option value=''> January 2014  (263)</option><option value=''> December 2013  (264)</option><option value=''> November 2013  (241)</option><option value=''> October 2013  (234)</option><option value=''> September 2013  (215)</option><option value=''> August 2013  (224)</option><option value=''> July 2013  (254)</option><option value=''> June 2013  (272)</option><option value=''> May 2013  (260)</option><option value=''> April 2013  (279)</option><option value=''> March 2013  (277)</option><option value=''> February 2013  (294)</option><option value=''> January 2013  (347)</option><option value=''> December 2012  (309)</option><option value=''> November 2012  (277)</option><option value=''> October 2012  (308)</option><option value=''> September 2012  (270)</option><option value=''> August 2012  (263)</option><option value=''> July 2012  (247)</option><option value=''> June 2012  (301)</option><option value=''> May 2012  (287)</option><option value=''> April 2012  (297)</option><option value=''> March 2012  (304)</option><option value=''> February 2012  (264)</option><option value=''> January 2012  (280)</option><option value=''> December 2011  (251)</option><option value=''> November 2011  (261)</option><option value=''> October 2011  (281)</option><option value=''> September 2011  (187)</option><option value=''> August 2011  (258)</option><option value=''> July 2011  (219)</option><option value=''> June 2011  (225)</option><option value=''> May 2011  (239)</option><option value=''> April 2011  (268)</option><option value=''> March 2011  (249)</option><option value=''> February 2011  (206)</option><option value=''> January 2011  (209)</option><option value=''> December 2010  (188)</option><option value=''> November 2010  (172)</option><option value=''> October 2010  (219)</option><option value=''> September 2010  (186)</option><option value=''> August 2010  (204)</option><option value=''> July 2010  (175)</option><option value=''> June 2010  (167)</option><option value=''> May 2010  (164)</option><option value=''> April 2010  (152)</option><option value=''> March 2010  (165)</option><option value=''> February 2010  (135)</option><option value=''> January 2010  (121)</option><option value=''> December 2009  (126)</option><option value=''> November 2009  (66)</option><option value=''> October 2009  (87)</option><option value=''> September 2009  (65)</option><option value=''> August 2009  (57)</option><option value=''> July 2009  (64)</option><option value=''> June 2009  (54)</option><option value=''> May 2009  (35)</option><option value=''> April 2009  (39)</option><option value=''> March 2009  (43)</option><option value=''> February 2009  (37)</option><option value=''> January 2009  (42)</option><option value=''> December 2008  (16)</option><option value=''> November 2008  (14)</option><option value=''> October 2008  (10)</option><option value=''> September 2008  (8)</option><option value=''> August 2008  (11)</option><option value=''> July 2008  (7)</option><option value=''> June 2008  (8)</option><option value=''> May 2008  (8)</option><option value=''> April 2008  (4)</option><option value=''> March 2008  (5)</option><option value=''> February 2008  (6)</option><option value=''> January 2008  (10)</option><option value=''> December 2007  (3)</option><option value=''> November 2007  (5)</option><option value=''> October 2007  (9)</option><option value=''> September 2007  (7)</option><option value=''> August 2007  (21)</option><option value=''> July 2007  (9)</option><option value=''> June 2007  (3)</option><option value=''> May 2007  (3)</option><option value=''> April 2007  (1)</option><option value=''> March 2007  (5)</option><option value=''> February 2007  (4)</option><option value=''> November 2006  (1)</option><option value=''> October 2006  (2)</option><option value=''> August 2006  (3)</option><option value=''> July 2006  (1)</option><option value=''> June 2006  (1)</option><option value=''> May 2006  (3)</option><option value=''> April 2006  (1)</option><option value=''> March 2006  (1)</option><option value=''> February 2006  (5)</option><option value=''> January 2006  (1)</option><option value=''> October 2005  (1)</option><option value=''> September 2005  (3)</option><option value=''> May 2005  (1)</option> </select> <script type='text/javascript'>(function() {
	var dropdown = document.getElementById( "archives-dropdown-3" );
	function onSelectChange() {
		if ( dropdown.options[ dropdown.selectedIndex ].value !== '' ) {
			document.location.href = this.options[ this.selectedIndex ].value;
	dropdown.onchange = onSelectChange;
})();</script> </div><div id="linkcat-3349" class="sb-widget widget_links"><h4 class="widget-title">Other sites</h4><ul class='xoxo blogroll'><li><a href="" title="SAS news gathered from bloggers">SAS blogs</a></li><li><a href="">Jobs for R-users</a></li></ul></div></aside></div></div><div class="copyright-wrap"><p class="copyright">Copyright © 2020 | <a href="" rel="nofollow">MH Corporate basic by MH Themes</a></p></div></div><div class="wpusb wpusb-buttons wpusb-fixed-right   wpusb-fixed wpusb-layout-buttons-content wpusb-fixed-position_fixed"
 data-disabled-share-counts="1" data-wpusb-component="counter-social-share"><div data-element="buttons" class="wpusb-fixed-right-container "><div class="wpusb-item wpusb-facebook "> <a href="" target="_blank"
 class="wpusb-layout-buttons wpusb-button wpusb-btn "
 title="Share on Facebook" 
 > <svg class="wpusb-svg wpusb-facebook-buttons "> <use xlink:href="#wpusb-facebook" /> </svg> </a></div><div class="wpusb-item wpusb-twitter "> <a href=" #rstats #datascience&via=rbloggers" target="_blank"
 class="wpusb-layout-buttons wpusb-button wpusb-btn "
 > <svg class="wpusb-svg wpusb-twitter-buttons "> <use xlink:href="#wpusb-twitter" /> </svg> </a></div><div class="wpusb-item wpusb-linkedin "> <a href="" target="_blank"
 class="wpusb-layout-buttons wpusb-button wpusb-btn "
 title="Share on Linkedin" 
 > <svg class="wpusb-svg wpusb-linkedin-buttons "> <use xlink:href="#wpusb-linkedin" /> </svg> </a></div><div class="wpusb-item wpusb-email "> <a href="mailto:?subject=An%20introduction%20to%20web%20scraping%3A%20locating%20Spanish%20schools&
%0A%0AAn%20introduction%20to%20web%20scraping%3A%20locating%20Spanish%20schools%0A%0Aby%20Jorge%20Cimentada%0A%20%20%20%20%20%20%20%20%0A%0A%0A%0AIntroduction%0AWhenever%20a%20new%20paper%20is%20released%20using%20some%20type%20of%20scraped%20data%2C%20most%20of%20my%20peers%20in%20the%20social%20science%20community%20get%20baffled%20at%20how%20researchers%20can%20do%20this.%20In%20fact%2C%20many%20social%20scientists%20can%E2%80%99t%20even%20think%20of%20research%20questions%20that%20can%20be%20addressed%20with%20this%20type%20of%20data%20simply%20because%20they%20don%E2%80%99t%20know%20it%E2%80%99s%20even%20possible.%20As%20the%20old%20saying%20goes%2C%20when%20you%20have%20a%20hammer%2C%20every%20problem%20looks%20like%20a%20nail.%0AWith%20the%20increasing%20amount%20of%20data%20being%20collected%20on%20a%20daily%20basis%2C%20it%20is%20eminent%20that%20scientists%20start%20getting%20familiar%20with%20new%20technologies%20that%20can%20help%20answer%20old%20questions.%20Moreover%2C%20we%20need%20to%20be%20adventurous%20about%20cutting%20edge%20data%20sources%20as%20they%20can%20also%20allow%20us%20to%20ask%20new%20questions%20which%20weren%E2%80%99t%20even%20thought%20of%20in%20the%20past.%0AIn%20this%20tutorial%20I%E2%80%99ll%20be%20guiding%20you%20through%20the%20basics%20of%20web%20scraping%20using%20R%20and%20the%20xml2%20package.%20I%E2%80%99ll%20begin%20with%20a%20simple%20example%20using%20fake%20data%20and%20elaborate%20further%20by%20trying%20to%20scrape%20the%20location%20of%20a%20sample%20of%20schools%20in%20Spain.%0A%0A%0ABasic%20steps%0AFor%20web%20scraping%20in%20R%2C%20you%20can%20fulfill%20almost%20all%20of%20your%20needs%20with%20the%20xml2%20package.%20As%20you%20wander%20through%20the%20web%2C%20you%E2%80%99ll%20see%20many%20examples%20using%20the%20rvest%20package.%20xml2%20and%20rvest%20are%20very%20similar%20so%20don%E2%80%99t%20feel%20you%E2%80%99re%20lacking%20behind%20for%20learning%20one%20and%20not%20the%20other.%20In%20addition%20to%20these%20two%20packages%2C%20we%E2%80%99ll%20need%20some%20other%20libraries%20for%20plotting%20locations%20on%20a%20map%20%28ggplot2%2C%20sf%2C%20rnaturalearth%29%2C%20identifying%20who%20we%20are%20when%20we%20scrape%20%28httr%29%20and%20wrangling%20data" target="_self" 
 class="wpusb-layout-buttons wpusb-button wpusb-btn "
 title="Send by email" 
 > <svg class="wpusb-svg wpusb-email-buttons "> <use xlink:href="#wpusb-email" /> </svg> </a></div><div class="wpusb-item wpusb-gmail "> <a href="
%0A%0AAn%20introduction%20to%20web%20scraping%3A%20locating%20Spanish%20schools%0A%0Aby%20Jorge%20Cimentada%0A%20%20%20%20%20%20%20%20%0A%0A%0A%0AIntroduction%0AWhenever%20a%20new%20paper%20is%20released%20using%20some%20type%20of%20scraped%20data%2C%20most%20of%20my%20peers%20in%20the%20social%20science%20community%20get%20baffled%20at%20how%20researchers%20can%20do%20this.%20In%20fact%2C%20many%20social%20scientists%20can%E2%80%99t%20even%20think%20of%20research%20questions%20that%20can%20be%20addressed%20with%20this%20type%20of%20data%20simply%20because%20they%20don%E2%80%99t%20know%20it%E2%80%99s%20even%20possible.%20As%20the%20old%20saying%20goes%2C%20when%20you%20have%20a%20hammer%2C%20every%20problem%20looks%20like%20a%20nail.%0AWith%20the%20increasing%20amount%20of%20data%20being%20collected%20on%20a%20daily%20basis%2C%20it%20is%20eminent%20that%20scientists%20start%20getting%20familiar%20with%20new%20technologies%20that%20can%20help%20answer%20old%20questions.%20Moreover%2C%20we%20need%20to%20be%20adventurous%20about%20cutting%20edge%20data%20sources%20as%20they%20can%20also%20allow%20us%20to%20ask%20new%20questions%20which%20weren%E2%80%99t%20even%20thought%20of%20in%20the%20past.%0AIn%20this%20tutorial%20I%E2%80%99ll%20be%20guiding%20you%20through%20the%20basics%20of%20web%20scraping%20using%20R%20and%20the%20xml2%20package.%20I%E2%80%99ll%20begin%20with%20a%20simple%20example%20using%20fake%20data%20and%20elaborate%20further%20by%20trying%20to%20scrape%20the%20location%20of%20a%20sample%20of%20schools%20in%20Spain.%0A%0A%0ABasic%20steps%0AFor%20web%20scraping%20in%20R%2C%20you%20can%20fulfill%20almost%20all%20of%20your%20needs%20with%20the%20xml2%20package.%20As%20you%20wander%20through%20the%20web%2C%20you%E2%80%99ll%20see%20many%20examples%20using%20the%20rvest%20package.%20xml2%20and%20rvest%20are%20very%20similar%20so%20don%E2%80%99t%20feel%20you%E2%80%99re%20lacking%20behind%20for%20learning%20one%20and%20not%20the%20other.%20In%20addition%20to%20these%20two%20packages%2C%20we%E2%80%99ll%20need%20some%20other%20libraries%20for%20plotting%20locations%20on%20a%20map%20%28ggplot2%2C%20sf%2C%20rnaturalearth%29%2C%20identifying%20who%20we%20are%20when%20we%20scrape%20%28httr%29%20and%20wrangling%20data" target="_blank"
 class="wpusb-layout-buttons wpusb-button wpusb-btn "
 title="Send by Gmail" 
 > <svg class="wpusb-svg wpusb-gmail-buttons "> <use xlink:href="#wpusb-gmail" /> </svg> </a></div></div> <span class="wpusb-toggle" data-action="close-buttons"> <svg class="wpusb-svg wpusb-angle-double-left "> <use xlink:href="#wpusb-angle-double-left" /> </svg> <svg class="wpusb-svg wpusb-angle-double-right "> <use xlink:href="#wpusb-angle-double-right" /> </svg> </span></div> <script>var snp_f = [];
        var snp_hostname = new RegExp(;
        var snp_http = new RegExp("^(http|https)://", "i");
        var snp_cookie_prefix = '';
        var snp_separate_cookies = false;
        var snp_ajax_url = '';
		var snp_ajax_nonce = '6f64896679';
        var snp_ignore_cookies = false;
        var snp_enable_analytics_events = false;
        var snp_enable_mobile = false;
        var snp_use_in_all = false;
        var snp_excluded_urls = [];
        snp_excluded_urls.push('');</script> <div class="snp-root"> <input type="hidden" id="snp_popup" value="" /> <input type="hidden" id="snp_popup_id" value="" /> <input type="hidden" id="snp_popup_theme" value="" /> <input type="hidden" id="snp_exithref" value="" /> <input type="hidden" id="snp_exittarget" value="" /><div id="snppopup-welcome" class="snp-pop-109583 snppopup"><input type="hidden" class="snp_open" value="scroll" /><input type="hidden" class="snp_show_on_exit" value="2" /><input type="hidden" class="snp_exit_js_alert_text" value="" /><input type="hidden" class="snp_exit_scroll_down" value="" /><input type="hidden" class="snp_exit_scroll_up" value="" /><input type="hidden" class="snp_open_scroll" value="50" /><input type="hidden" class="snp_optin_redirect_url" value="" /><input type="hidden" class="snp_show_cb_button" value="yes" /><input type="hidden" class="snp_popup_id" value="109583" /><input type="hidden" class="snp_popup_theme" value="theme6" /><input type="hidden" class="snp_overlay" value="disabled" /><input type="hidden" class="snp_cookie_conversion" value="30" /><input type="hidden" class="snp_cookie_close" value="180" /><div class="snp-fb snp-theme6"><div class="snp-subscribe-inner"><h1 class="snp-header"><i>Never miss an update! </i> <br/> <strong>Subscribe to R-bloggers</strong> to receive <br/>e-mails with the latest R posts.<br/> <small>(You will not see this message again.)</small></h1><div class="snp-form"><form action="" method="post" class="snp-subscribeform snp_subscribeform"><fieldset><div class="snp-field"> <input type="text" name="email" id="snp_email" placeholder="Your E-mail..." class="snp-field snp-field-email" /></div> <button type="submit" class="snp-submit">Submit</button></fieldset></form></div> <a href="#" class="snp_nothanks snp-close">Click here to close (This popup will not appear again)</a></div></div><style>.snp-pop-109583 .snp-theme6 { max-width: 700px;}
.snp-pop-109583 .snp-theme6 h1 {font-size: 17px;}
.snp-pop-109583 .snp-theme6 { color: #a0a4a9;}
.snp-pop-109583 .snp-theme6 .snp-field ::-webkit-input-placeholder { color: #a0a4a9;}
.snp-pop-109583 .snp-theme6 .snp-field :-moz-placeholder { color: #a0a4a9;}
.snp-pop-109583 .snp-theme6 .snp-field :-ms-input-placeholder { color: #a0a4a9;}
.snp-pop-109583  .snp-theme6 .snp-field input { border: 1px solid #a0a4a9;}
.snp-pop-109583 .snp-theme6 .snp-field { color: #000000;}
.snp-pop-109583 .snp-theme6 { background: #f2f2f2;}</style><script>jQuery(document).ready(function() {
});</script> </div> <script type="text/javascript">var CaptchaCallback = function() {
                jQuery('.g-recaptcha').each(function(index, el) {
                    grecaptcha.render(el, {
                        'sitekey' : ''
            };</script> </div> <script type='text/javascript'>window.FPConfig= {
	delay: 0,
	ignoreKeywords: ["\/wp-admin","\/wp-login.php","\/cart","add-to-cart","logout","#","?",".png",".jpeg",".jpg",".gif",".svg"],
	maxRPS: 3,
    hoverDelay: 50
};</script> <script type='text/javascript' src=''></script> <script type='text/javascript' src='' async='async' defer='defer'></script> <script type='text/javascript'>_stq = window._stq || [];
	_stq.push([ 'view', {v:'ext',j:'1:7.3.2',blog:'11524731',post:'193096',tz:'-6',srv:''} ]);
	_stq.push([ 'clickTrackerInit', '11524731', '193096' ]);</script> <script defer src=""></script><!--noptimize--><!-- Autoptimize found a problem with the HTML in your Theme, tag `/body` missing --><!--/noptimize-->	<script type="text/javascript">
        jQuery(document).ready(function ($) {
            //$( document ).ajaxStart(function() {

            for (var i = 0; i < document.forms.length; ++i) {
                var form = document.forms[i];
				if ($(form).attr("method") != "get") { $(form).append('<input type="hidden" name="GHNaODVTJFSYB" value="op7Odnce_NhG]" />'); }
if ($(form).attr("method") != "get") { $(form).append('<input type="hidden" name="ML_x-YAPOWRitp" value="rSVJs@[UOxp*6i" />'); }
if ($(form).attr("method") != "get") { $(form).append('<input type="hidden" name="TCtPxySXocwH" value="]prG[vyHFTbx" />'); }
if ($(form).attr("method") != "get") { $(form).append('<input type="hidden" name="gRDJ_Wn" value="cErYB4J" />'); }

            $(document).on('submit', 'form', function () {
				if ($(this).attr("method") != "get") { $(this).append('<input type="hidden" name="GHNaODVTJFSYB" value="op7Odnce_NhG]" />'); }
if ($(this).attr("method") != "get") { $(this).append('<input type="hidden" name="ML_x-YAPOWRitp" value="rSVJs@[UOxp*6i" />'); }
if ($(this).attr("method") != "get") { $(this).append('<input type="hidden" name="TCtPxySXocwH" value="]prG[vyHFTbx" />'); }
if ($(this).attr("method") != "get") { $(this).append('<input type="hidden" name="gRDJ_Wn" value="cErYB4J" />'); }
                return true;

                beforeSend: function (e, data) {


                    if (data.type !== 'POST') return;

                    if (typeof === 'object' && !== null) {"GHNaODVTJFSYB", "op7Odnce_NhG]");"ML_x-YAPOWRitp", "rSVJs@[UOxp*6i");"TCtPxySXocwH", "]prG[vyHFTbx");"gRDJ_Wn", "cErYB4J");
                    else {
               = + '&GHNaODVTJFSYB=op7Odnce_NhG]&ML_x-YAPOWRitp=rSVJs@[UOxp*6i&TCtPxySXocwH=]prG[vyHFTbx&gRDJ_Wn=cErYB4J';

<!-- Dynamic page generated in 1.480 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2020-07-03 01:24:21 -->

<!-- Compression = gzip -->