Site icon R-bloggers

Step-by-Step Guide to Use R and Selenium to Scrape Empleos Publicos

[This article was first published on https://pacha.dev/blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< !DOCTYPE html> < charset="utf-8"> < http-equiv="X-UA-Compatible" content="IE=edge"> < name="viewport" content="width=device-width, initial-scale=1.0"> pacha.dev/blog < !-- MathJax Configuration --> < !-- Smart header: libraries detected based on content --> < !-- File: /tmp/tmp.k3m5Z7Ladl/index.html -->
  • < !-- DEBUG: Found sourceCode --> < !-- Load custom CSS after any library CSS to ensure proper precedence -->
  • < header class="site-top">

    Mauricio “Pachá” Vargas Sepúlveda

    Blog with notes about R, Shiny, SQL, Python, Linux and C++. This blog is listed on R-Bloggers.

    HOME 🏠
    < !-- categories are printed below this--> < nav class="sidebar-nav">

    Categories

    < header id="title-block-header" class="quarto-title-block default">

    Step-by-Step Guide to Use R and Selenium to Scrape Empleos Publicos

    Using R, selenium and purrr to organize hundreds of HTML sections into one table.
    Author

    Mauricio “Pachá” Vargas S.

    Published

    August 20, 2025

    Because of delays with my scholarship payment, if this post is useful to you I kindly ask a minimal donation on Buy Me a Coffee. It shall be used to continue my Open Source efforts. The full explanation is here: A Personal Message from an Open Source Contributor.

    < section id="motivation" class="level2">

    Motivation

    My friend Nicolas Didier asked me about reading Empleos Publicos with R or Python. Here is a short example for him and anybody who may benefit from reading this.

    The following steps were adapted from a tutorial I taught at the University of Michigan (GO BLUE!) in 2023.

    < section id="required-r-packages" class="level2">

    Required R packages

    • RSelenium: R-Selenium integration
    • rvest: HTML processing
    • dplyr: to load the pipe operator (can be used later for data cleaning)
    • purrr: iteration (i.e., repeated operations)

    I installed RSelenium from the R console:

    if (!require(RSelenium)) install.packages("RSelenium")
    
    # or
    
    remotes::install_github("ropensci/RSelenium")

    For the rest of the packages:

    if (!require(rvest)) install.packages("rvest")
    if (!require(dplyr)) install.packages("dplyr")
    if (!require(purrr)) install.packages("purrr")
    < section id="installing-selenium-and-chromechromium" class="level2">

    Installing Selenium and Chrome/Chromium

    Note for Ubuntu/Debian users: We need to check that chrome or chromium is installed in our system. One of the many options is to use the bash console.

    sudo add-apt-repository ppa:savoury1/chromium
    sudo apt update
    sudo apt install chromium-browser
    sudo apt install chromium-chromedriver

    Not using the PPA will install the snap version of Chromium, which is not compatible with Selenium.

    I tried to start Selenium as it is mentioned in the official guide and it did not work.

    I had to install Chromium. I am on Manjaro and I ran sudo pacman -S chromium. Windows/Mac users can use Google Chrome.

    An extra requirement was to download Selenium Server. Based on this, I started by creating a directory to store the data for this post by typing this in VS Code terminal:

    mkdir -p /tmp/didier-example
    cd /tmp/didier-example

    Then I opened R witn R and downloaded the JAR file:

    url_jar <- "https://github.com/SeleniumHQ/selenium/releases/download/selenium-3.9.1/selenium-server-standalone-3.9.1.jar"
    sel_jar <- "selenium-server-standalone-3.9.1.jar"
    
    if (!file.exists(sel_jar)) {
      download.file(url_jar, sel_jar)
    }

    I had to run Selenium from a new terminal:

    cd /tmp/didier-example
    java -jar selenium-server-standalone-3.9.1.jar

    Back to the R terminal, I was finally in condition to control the browser from R:

    library(RSelenium)
    library(rvest)
    library(dplyr)
    library(purrr)
    
    rmDr <- remoteDriver(port = 4444L, browserName = "chrome")
    
    rmDr$open(silent = TRUE)
    
    url <- "https://www.empleospublicos.cl"
    
    rmDr$navigate(url)

    This should display a new Chrome/Chromium window that says “Chrome is being controlled by automated test software”.

    < section id="scraping-the-data" class="level2">

    Scraping the data

    Using the browser’s inspector (ctrl + shift + i), I explored the page to see that the search bar corresponds to:

    <input class="buscador-principal search form-control buscador-movil" name="q" type="search" autocomplete="off" placeholder="Ingresa el cargo, comuna o institución" id="buscadorprincipal">

    For example, I can search for “Ministerio de Salud” because there were many posts by that organization on the landing page:

    search_box <- rmDr$findElement(using = "id", value = "buscadorprincipal")
    search_box$sendKeysToElement(list("Ministerio de Salud", key = "enter"))

    That typed “Ministerio de Salud” and clicked search on my behalf. Inspecting the results I see that each job offer starts with

    <div class="items col-md-4 col-lg-4 postulacion ...

    The first offer listed is this:

    <div class="items col-md-4 col-lg-4 postulacion otro otro eepp region7renta3calidad2 busqueda "><div class="item"><div class="top"><div class="label label-estado"><i class="fa fa-circle circulo-status1" aria-hidden="true"></i> Postulación hasta 30/09/2025 23:59:00</div><h3><a target="_blank" href="https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo" onclick="ga('send', 'event', 'convocatorias', 'Medico (a) especialista en Anestesiología 44 horas | Servicio de Salud Maule / Hospital de Constitución', 'eepp');">Medico (a) especialista en Anestesiología 44 horas</a></h3><p>Servicio de Salud Maule / Hospital de Constitución</p></div><hr><div class="cnt"><p>Ministerio de Salud</p><p>Constitución</p><br><div class="alert alert-primer"><i class="fa fa-address-card" aria-hidden="true"></i>  No pide experiencia</div><div class="row card-footer"><div class="col-xs-9 col-md-8 text-left"><a class="cronograma btn " url="https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo" onclick="return false;" href="#" title="Ver Cronograma de la Convocatoria"><i class="fa fa-calendar-days"></i> Calendarización</a>
            <div class="compartir-social">      
                <div class="row">
                    <div class="col-xs-3 col-md-4">
                        <a class="btn" onclick="enviarRS('t', 'https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo', 'Medico (a) especialista en Anestesiología 44 horas Servicio de Salud Maule / Hospital de Constitución'); return false;" href="#" target="_blank" title="Compartir en Twitter"><i class="fa-brands fa-square-x-twitter fa-xl" aria-hidden="true"></i></a>
                    </div>
                    <div class="col-xs-3 col-md-4">
                        <a class="btn" onclick="enviarRS('f', 'https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo', 'Medico (a) especialista en Anestesiología 44 horas Servicio de Salud Maule / Hospital de Constitución'); return false;" href="#" target="_blank" title="Compartir en Facebook"><i class="fa-brands fa-square-facebook fa-xl" aria-hidden="true"></i></a>
                    </div>
                    <div class="col-xs-3 col-md-4">
                        <a class="btn" onclick="enviarRS('l', 'https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo', 'Medico (a) especialista en Anestesiología 44 horas Servicio de Salud Maule / Hospital de Constitución'); return false;" href="#" target="_blank" title="Compartir en Linkedin"><i class="fa-brands fa-linkedin fa-xl" aria-hidden="true"></i></a>
                    </div>
                    <div class="col-xs-3 col-md-4">
                        <a class="btn whatsapp-link visible-xs visible-sm" title="Compartir en Whatsapp" onclick="enviarRS('w', 'https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo', 'Medico (a) especialista en Anestesiología 44 horas Servicio de Salud Maule / Hospital de Constitución'); return false;" href="#" data-action="share/whatsapp/share"><i class="fa-brands fa-square-whatsapp fa-xl" aria-hidden="true"></i></a>
                    </div>
                </div>
            </div>
        <div class="row"><div class="col-md-12 card-footer-contenido "></div></div></div></div></div></div></div>
    html <- read_html(rmDr$getPageSource()[[1]])
    
    offers <- html %>%
      html_nodes("div.items")
    
    offers_tbl <- map_df(offers, function(offer) {
      # Extract position (job title)
      position <- offer %>%
        html_node("h3 a") %>%
        html_text(trim = TRUE)
      
      # Extract organization (usually the first <p> inside .top)
      organization <- offer %>%
        html_node(".top p") %>%
        html_text(trim = TRUE)
      
      # Extract city (the second <p> inside .cnt)
      city <- offer %>%
        html_nodes(".cnt p") %>%
        .[2] %>%
        html_text(trim = TRUE)
      
      tibble(
        position = position,
        organization = organization,
        city = city
      )
    })

    The result has the following structure:

    offers_tbl
    # A tibble: 552 × 3
       position                                                   organization city 
       <chr>                                                      <chr>        <chr>
     1 Medico (a) especialista en Anestesiología 44 horas         Servicio de… Cons…
     2 Titulares de la Planta Profesional Ley 18.834              Servicio de… Valp…
     3 ENFERMERA-O, JORNADA DIURNA, GRADO 12, PARA SERVICIO CLÍN… Servicio de… Reco…
     4 Psiquiatra infanto-juvenil sistema de atención intersecto… Servicio de… La P…
     5 Neurólogo(a) adulto GES Alzheimer y otras demencias        Servicio de… Puen…
     6 Médico(a) especialista en Neurología Infantil Hospital de… Servicio de… Cast…
     7 Arquitecto de Software                                     Central de … Ñuñoa
     8 TENS OPERADOR DE EQUIPOS DE ESTERILIZACIÓN                 Servicio de… Peña…
     9 (850-2892) Médico Especialista Broncopulmonar o Internist… Servicio de… Talc…
    10 Enfermero(a) Clínico(a) Atención Abierta y Cerrada         Servicio de… Huas…
    glimpse(offers_tbl)
    > glimpse(offers_tbl)
    Rows: 552
    Columns: 3
    $ position     <chr> "Medico (a) especialista en Anestesiología 44 horas", "Ti…
    $ organization <chr> "Servicio de Salud Maule / Hospital de Constitución", "Se…
    $ city         <chr> "Constitución", "Valparaíso", "Recoleta", "La Pintana", "…

    I know this is a simple example but should allow different kinds of exploration and data extraction. I hope it helps.

    < footer>

    Loading…

  • < !-- Load shared sidebar -->
    To leave a comment for the author, please follow the link and comment on their blog: https://pacha.dev/blog.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Exit mobile version