Highlight the Pipe. Highlight.js

[This article was first published on QuestionFlow , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Practical advices about customizing code highlighting on web pages with highlight.js.

Prologue

While creating this site I had to encounter the topic of highlighting code on web pages. I decided to do that with the help of highlight.js functionality. After picking a style with R in mind, I arrived to the following question: is there an easy way to highlight pipe operator %>% separately? As it turned out, the answer is “Yes”, but the journey through unexplored world of JavaScript was bumpy with a pleasant moment of meeting familiar name.

So this post is about adding custom rules for code highlighting in highlight.js, taking string %>% as an example.

Overview

The “Getting Started” part of Usage page says that to start using highlight.js on a web page the following code should be executed:

<link rel="stylesheet" href="/path/to/styles/default.css">
<script src="/path/to/highlight.pack.js"></script>
<script>hljs.initHighlightingOnLoad();</script>

The description is “This will find and highlight code inside of <pre> tags; it tries to detect the language automatically. If automatic detection doesn’t work for you, you can specify the language in the class attribute:”</p> <pre><pre><code class="html">...</code></pre></pre> <p>So basically the process of highlighting the text inside <code><pre><code>...</code></pre> is the following:

  • Detect language (either automatically or with class attribute inside <pre></code> or tag).</li> <li>Apply some complicated parsing with functionality sourced from “/path/to/highlight.pack.js”. This will, based on predefined rules, wrap some parts of text with <code><span></span></code> tags and appropriate class.</li> <li>Apply CSS customization based on “/path/to/styles/default.css” file and classes of <code><span></code> tags created in the previous step.</li> </ul> <p>To be more specific with code, this site uses at the time of writing this post (with help of <a href="https://gohugo.io/">Hugo</a> and <a href="https://github.com/calintat/minimal">Minimal theme</a>) the following code:</p> <pre><link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/idea.min.css"> <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script> <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/languages/yaml.min.js"></script> <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/languages/html.min.js"></script> <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/languages/javascript.min.js"></script> <script>hljs.initHighlightingOnLoad();</script></pre> <p>The first block loads CSS code for “Idea” style, the second - JavaScript code for general <strong>highlight.js</strong> functionality, the third - code for parsing rules for specific languages (YAML and HTML) and the fourth initializes <strong>highlight.js</strong>. Basically, files <code>yaml.min.js, html.min.js and javascript.min.js contain information about actual rules of code parsing.

Custom parsing rules

The similar file but for R, with my custom indentation, looks like this:

hljs.registerLanguage("r",
  function(e){
    var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";
    return{
      c:[e.HCM,
        {b:r,l:r, k:
          {keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass ...",
          literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},
        r:0},
        {cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},
        {cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},
        {cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},
        {cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},
        {cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},
        {b:"`",e:"`",r:0},
        {cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]}
      ]
    }
  }
);

After first seeing this without indentation, as one string, I was a little bit intimidated. Fortunately, after some internet searching I found highlight.js github repository with very useful src directory. It contains subdirectories languages (for JavaScript rules like mentioned above) and styles (for styles’ CSS code).

The file for parsing R code is src/languages/r.js. Its core was written in the spring of 2012 by Joe Cheng, creator of the Shiny framework. Seeing familiar name during rough JavaScript journey somewhat cheered me up. After studying the code, many questions were answered:

  • By default the following pieces of code can be manually highlighted: comment, string, number, keyword, literal (TRUE, FALSE, NULL, NA, etc.).
  • Those one- and two-letter variables in code are just short versions of more understandable className, begin, end, relevance, etc.
  • To add custom piece of code to highlight one should add appropriate class in the parsing rules. There is a thorough highlight.js documentation if you want to master the logic behind these rules. The most easy and understandable way of creating the rule is specifying regular expressions for beginning and ending of desired class. Note that if ending is omitted then it is just the regex for the class. For example:
{className: "pipe", begin: "%>%", relevance: 0}

This code finds string %>% and wraps it as <span class="hljs-pipe">%>%</span> (note prefix “hljs-”). About relevance argument one can read here, as it is not very important for the current topic.

With this knowledge one can create other interesting rules:

// Function parameters with good style as 'variable' + 'space' + '=' + 'space'
{className: "fun-param", begin: "([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*\\s+=\\s+", relevance: 0},

// Assign operator with good style
{className: "assign", begin: " <- ", relevance: 0},

// Adding to class 'keyword' the explicit use of function's package
{className: "keyword", begin: "([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*::", relevance: 0},

// Class for basic dplyr words with their scoped variants
// Not included in this site highlighting rules
{className: "dplyr", begin: "tibble|mutate|select|filter|summari[sz]e|arrange|group_by", end: "[a-zA-Z0-9._]*", relevance: 0}

It is important to add these rules in the appropriate places, because they are processed sequentially, so order matters. The final version of this site’s rules for R looks like this (click to unfold the spoiler):

custom.r.min.js
hljs.registerLanguage("r",
  function(e){
    var r="([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*";
    return{
      c:[e.HCM,
        {cN:"fun-param",b:"([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*\\s+=\\s+",r:0},
        {cN:"pipe",b:"%>%",r:0},
        {cN:"assign",b:" <- ",r:0},
        {cN:"keyword",b:"([a-zA-Z]|\\.[a-zA-Z.])[a-zA-Z0-9._]*::",r:0},
  //      {cN:"dplyr",b:"tibble|mutate|select|filter|summari[sz]e|arrange|group_by",e:"[a-zA-Z0-9._]*",r:0},
        {b:r,l:r, k:
          {keyword:"function if in break next repeat else for return switch while try tryCatch stop warning require library attach detach source setMethod setGeneric setGroupGeneric setClass ...",
          literal:"NULL NA TRUE FALSE T F Inf NaN NA_integer_|10 NA_real_|10 NA_character_|10 NA_complex_|10"},
        r:0},
        {cN:"number",b:"0[xX][0-9a-fA-F]+[Li]?\\b",r:0},
        {cN:"number",b:"\\d+(?:[eE][+\\-]?\\d*)?L\\b",r:0},
        {cN:"number",b:"\\d+\\.(?!\\d)(?:i\\b)?",r:0},
        {cN:"number",b:"\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",r:0},
        {cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",r:0},
        {b:"`",e:"`",r:0},
        {cN:"string",c:[e.BE],v:[{b:'"',e:'"'},{b:"'",e:"'"}]}
      ]
    }
  }
);

This code should be sourced on every page with highlighting. For this to be done with Hugo:

  • Save this code into file static/js/custom.r.min.js.
  • Add the following line to the head of every web page (usually by modifying partial template for page’s header):
<script src="/js/custom.r.min.js"></script> 

Custom style

Styling of the parsed code is done with CSS, so some knowledge of it is needed. This is done by properly adding CSS rules to every page with highlighting. For example, this site’s CSS rules for specifically R code highlighting look like this:

/* Set colour for function parameters */
.hljs-fun-param {
    color: #ff4000;
}

/* Make the pipe and assign operator bold */
.hljs-pipe, .hljs-assign {
    font-weight: bold;
}

The result looks like this:

# R comment with %>% and <- .
iris_summary <- iris %>%
  dplyr::group_by(Species) %>%
  dplyr::summarise(meanSepalLength = mean(Sepal.Length))

starts_with_str <-function(x=c(" %>% ", " <- ")) {
  paste0("Starts with", x)
}

Notice the following:

  • Strings %>% and <- are not specially highlighted inside comment or string.
  • Use of dplyr:: is highlighted the same as keyword function.
  • Strings = (in function parameters) and <- should be surrounded by spaces (for which styling is also applied) to be correctly highlighted. This encourages tidyverse style guide.

Conclusions

  • Asking questions about seemingly simple task can lead to the long journey of code exploration.
  • Meeting familiar names during times of trouble can be inspiring.
  • Creating custom rules for code highlighting with highlight.js is pretty straightforward for R people (after some JavaScript and CSS adjusting).
sessionInfo()
sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=ru_UA.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=ru_UA.UTF-8        LC_COLLATE=ru_UA.UTF-8    
##  [5] LC_MONETARY=ru_UA.UTF-8    LC_MESSAGES=ru_UA.UTF-8   
##  [7] LC_PAPER=ru_UA.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=ru_UA.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.4.2  backports_1.1.1 bookdown_0.5    magrittr_1.5   
##  [5] rprojroot_1.2   tools_3.4.2     htmltools_0.3.6 yaml_2.1.14    
##  [9] Rcpp_0.12.13    stringi_1.1.5   rmarkdown_1.7   blogdown_0.2   
## [13] knitr_1.17      stringr_1.2.0   digest_0.6.12   evaluate_0.10.1

To leave a comment for the author, please follow the link and comment on their blog: QuestionFlow .

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)