A simple static website generator with purrr

July 26, 2017
By

(This article was first published on Quantitative Consulting - R analytics blog, and kindly contributed to R-bloggers)

Today, I want to present a simple way to use purrr to create a static website generator. Of course, there is Jekyll, Hugo and the blogdown package, but in many cases you may find yourself, like I did, with your own bits and pieces of html, no time to learn yet another language, and in need of a way to put all this into a structured and consistent static website.

The approach I took is

  • keep all the website elements in one large html file, the “master page” (a collection of files works too)
  • use html comments to define the start and end of the different sections, of which the different pages of the website will be composed
  • define all links using hrefs on #ids

and write a R /purrr-script to

  • manage the recombination of the sections onto different pages
  • replace all href-attributes with correct link addresses

To show how this works, assume our “master page” looks like this


html>

 type="text/css">

body, html {height: 100%;}
.A { border: 2px solid orange; height: 30%; width: 60%;}
.B { background-color: darkorange ; height: 80%; width: 60%;}
.C { border: 2px solid lightblue; height: 90%; width: 60%;}
.D { background-color: steelblue ; height: 70%; width: 60%;}

.navbar {width: 60%; background-color: grey; position: fixed; top: 0;}

a { display: inline-block; padding: 0.5rem;}





 class="navbar">
 href="#HOME"> Home 
 href="#BB"> Link to BB
 href="#CC"> Link to CC


My website


class="A">

AA - Intro material

href="#DD"> LINK to DD
id="BB">


class="B">

Section BB

href="#CC"> LINK to CC
id="CC">


class="C">

Section CC

href="#BB"> LINK to BB
id="DD">

class = "D">

DD - Some other content

href="#BB"> LINK to BB href="#CC"> LINK to CC

This html contains all the parts we need to piece together our website. All we do here is to mark each of our sections with html-comments à la . For simplicity, we also place and name our id-attributes accordingly. Also, usually, we would keep the CSS in a separate file.

Now, let’s say we want a front page with intro material A and one page each for the material in sections B and C+D.

We start by reading the file into a nested data frame, one nest for each of the sections we have defined

library(tidyverse)
library(stringr)

PATHOUT = "./Site/"
fildat <- readLines("masterpage.html")

## marker for html blocks
markerdat <- tibble(
markerline = fildat %>% str_which("|\\t", ""))

markerall <- markerdat %>% select( -orig ) %>%
  spread(key = markertype, value = markerline) %>%
  mutate(origlines = map2( START, END, ~`[`(fildat, .x:.y))) %>%
  arrange(START)

markerall
## # A tibble: 6 x 4
##   markername   END START  origlines
##               
## 1     HEADER    17     1 
## 2        TOP    29    18 
## 3         AA    35    30  
## 4         BB    44    36  
## 5         CC    53    45  
## 6         DD    63    54 

We then define how our website should be built by declaring the order of the building blocks for each page.

pagedef <- list(
main = tibble(
filename = "index.html",
blocks = c("HEADER",
"",
"
\"HOME\">
"
, "TOP", "AA", "

A few additional remarks


"
, " ") ), sectB = tibble( filename = "sectB.html", blocks = c("HEADER", "", "TOP", "BB", " ") ), sectCD = tibble( filename = "sectCD.html", blocks = c("HEADER", "", "TOP", "CC", "DD", " ") ) ) %>% bind_rows()

Identifying the corresponding lines of html for each block then really becomes no more than a join-operation:

pagecompile <- pagedef  %>% 
  left_join(markerall, by = c("blocks" = "markername")) %>%
  mutate(publishlines =
map2(blocks, origlines, ~if(is.null(.y)) {.x} else {.y})) %>% 
  select(filename, publishlines)%>% 
  unnest() %>%
  nest(-filename)

pagecompile
## # A tibble: 3 x 2
##      filename              data
##                     
## 1  index.html 
## 2  sectB.html 
## 3 sectCD.html 

This defines the three html files we want for our website. It remains to make sure that the original href-attributes are adapted to this structure. We start by collecting the info on all (types of) link sources and link targets in our three website files

linksrc <- pagecompile %>%
 mutate(href =  map(data, ~str_match(.$publishlines,
pattern = "href\\s*=\\s*\"(#[A-Z0-9_]*)\"" )[,2]) %>%
          map(~unique(.[!is.na(.)]))) %>% 
  select(srcfile = filename, href) %>%
  unnest()

linktgt <- pagecompile %>%
  mutate(id =  map(data, ~str_match(.$publishlines,
pattern = "id\\s*=\\s*\"([A-Z0-9_]*)\"" )[,2]) %>%
          map(~paste0("#", unique(.[!is.na(.)])))) %>% 
  select(tgtfile = filename, id) %>%
  unnest()

linktgt
## # A tibble: 4 x 2
##       tgtfile    id
##          
## 1  index.html #HOME
## 2  sectB.html   #BB
## 3 sectCD.html   #CC
## 4 sectCD.html   #DD

We would be in trouble if an id-attribute appeared more than once in the linktgt-table. This could happen if we wanted to reuse some of our html-blocks on several pages. In that case, we would have to introduce an additional rule, which of the copies should be considered as the true target for links.

For our little demonstration here, I have avoided this and other complexities. One additional rule, however, already needs taking care of in our example: On pages like sectB.html, whose content is exclusively from our block B, we would not want the link to jump to the id in the middle of the page. Rather, in that case, it seems more appropriate to link to the top of the page. Let’s have a look at the resulting link-structure:

link2top <- c("#BB", "#HOME")

linkInOut <- linksrc %>% 
  full_join(linktgt, by = c("href" = "id")) %>%
  mutate(finalLinkName = paste0(ifelse(srcfile == tgtfile, "", tgtfile),
ifelse(href %in% link2top,"", href )))%>%
  select(srcfile, href, finalLinkName)

linkInOut 
## # A tibble: 10 x 3
##        srcfile  href  finalLinkName
##                     
##  1  index.html #HOME
##  2  index.html   #BB     sectB.html
##  3  index.html   #CC sectCD.html#CC
##  4  index.html   #DD sectCD.html#DD
##  5  sectB.html #HOME     index.html
##  6  sectB.html   #BB
##  7  sectB.html   #CC sectCD.html#CC
##  8 sectCD.html #HOME     index.html
##  9 sectCD.html   #BB     sectB.html
## 10 sectCD.html   #CC            #CC

Now, all that remains to be done is to replace the href-attributes and to write out the resulting html-files. In a real life application you may have to work a little more to avoid mismatches. Also, other features, like automatically updated navigation menus when you add a blog entry etc., require a little extra.
But the overall procedure seems sound and a version of this program produces the (small) website you are just looking at, in a matter of seconds.

## prepare replacement pattern
linkInOut <- linkInOut %>% 
  group_by(srcfile) %>% nest() %>%
  mutate(repPattern =
map(data, ~structure(paste0("href = \"",.$finalLinkName),
.Names = paste0("href\\s*=\\s*\"",.$href)))) %>%
  select(-data)

## run replacements on data
pagecompile <- pagecompile %>% 
  left_join(linkInOut, by = c("filename" = "srcfile")) %>%
  mutate(dataTrans = map2(data, repPattern,
~str_replace_all(.x$publishlines, .y)))

## write out pages
pagecompile <- pagecompile %>%
  mutate( fullname = paste0(PATHOUT, filename),
written = map2(fullname, dataTrans, ~writeLines(.y, .x) ))

To leave a comment for the author, please follow the link and comment on their blog: Quantitative Consulting - R analytics blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Most visited articles of the week

  1. How to write the first for loop in R
  2. The 'see' package: beautiful figures for easystats
  3. How to interactively examine any R code - 4 ways to not just read the code, but delve into it step-by-step
  4. 5 Ways to Subset a Data Frame in R
  5. Create Animation in R : Learn by Examples
  6. R – Sorting a data frame by the contents of a column
  7. Data Science Job in 90 days – Book Review
  8. Using apply, sapply, lapply in R
  9. Installing R packages

Sponsors

RSS Jobs for R users

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)