Preventing escaping in HTML

[This article was first published on MATHEMATICS IN MEDICINE, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Preventing escaping in HTML

Preventing escaping in HTML

library(xtable)

## 
## Attaching package: 'xtable'
## 
## The following objects are masked from 'package:Hmisc':
## 
##     label, label<-

library(stringr)
library(whisker)

Problem statement

Being a novice in R language, the problem I faced maight be a novice one, but I spent hours working on it.
I was working on making a html based report from a database (PostgreSQL), which would gather text information from the database and put it in the report in html format. I was using Rhtml in RStudio and inserting text into the specified position inside the Rhtml document by using code chunks, as text layout is more controlled than Rmd.
One of the database field contained text with multiple newline tags which indicated the places where I had inserted ENTER keystrokes. The example is shown below:
  string 1\nstring 2\nstring 3\nstring 4
I faced problems when I was trying to render this text as following in the resulting html document.
  1.  string 1
  2.  string 2
  3.  string 3
  4.  string 4

Method I was using in which I failed

I assigned a variable to the whole string.
s <- "string 1\nstring 2\nstring 3\nstring 4"
I substituted the \n with <br>, html tag for line break using str_replace_all
s1 <- str_replace_all(string = s, pattern = "\\n", replacement = "<br>")
s1

## [1] "string 1<br>string 2<br>string 3<br>string 4"
Then I tried to put the string s1 into the html document as follows. Actually I was working with dataframe with multiple rows and wanted to convert the data in table format.
print(xtable(data.frame(s1)), type = "html")
s1
1 string 1< br> string 2< br> string 3< br> string 4
I was not able to convert <br> in line breaks and the <br> came in the output verbatim. I cheked up the underlying html code and found the following:
  <TABLE border=1>
    <TR> <TH>  </TH> <TH> s1 </TH>  </TR>
    <TR>
      <TD align="right"> 1 </TD>
      <TD> string 1< br>  string 2< br>  string 3< br>  string 4</TD>
    </TR>
  </TABLE>
What happened internally was that, while parsing the document html escaped < and > tags into < and > respectively. Problem I faced was how to prevent escaping the <br> and thereby inserting line breaks.

Method 1

I splitted the original string s and trimmed the resultant components.
s2 <- str_trim(unlist(str_split(s, "\\n")))
s2

## [1] "string 1" "string 2" "string 3" "string 4"
I made dataframe out of the character vector and printed the required output.
d <- data.frame(str_c(seq(from = 1, by = 1, along.with = s2), "."), s2)
print(xtable(d), type = "html", include.colnames = F, include.rownames = F, 
    html.table.attributes = "style='border-width:0;'")
1. string 1
2. string 2
3. string 3
4. string 4
which is the required output!!
But, I have not done anything to prevent <br> from getting escaped, I have bypassed the issue.

Method 2

This method uses {{Mustache}} and its R implementation, whisker package.
I have used the string s1 and take the following steps
l <- list(s1 = s1)
html.templ <- "<table><tr><td>{{{s1}}}</td></tr></table>"
cat(whisker.render(template = html.templ, data = l))

## <table><tr><td>string 1<br>string 2<br>string 3<br>string 4</td></tr></table>
{{{}}} prevents the <br> from getting escaped.
The output is as follows:
cat(whisker.render(template = html.templ, data = l))
string 1
string 2
string 3
string 4
It is a much smaller and cleaner code.

Concluding remarks

I request if any more techniques are there to prevent escaping the < and > from html rendering.

Session Information

sessionInfo()

## R version 3.0.2 (2013-09-25)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_IN.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_IN.UTF-8        LC_COLLATE=en_IN.UTF-8    
##  [5] LC_MONETARY=en_IN.UTF-8    LC_MESSAGES=en_IN.UTF-8   
##  [7] LC_PAPER=en_IN.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] datasets  grid      grDevices splines   graphics  utils     stats    
## [8] methods   base     
## 
## other attached packages:
##  [1] whisker_0.3-2    xtable_1.7-1     knitr_1.5        mypackage_1.0   
##  [5] devtools_1.4.1   dplyr_0.1.2      ggplot2_0.9.3.1  rms_4.0-0       
##  [9] SparseM_0.99     Hmisc_3.13-0     Formula_1.1-1    cluster_1.14.4  
## [13] car_2.0-19       stringr_0.6.2    lubridate_1.3.3  lattice_0.20-24 
## [17] epicalc_2.15.1.0 nnet_7.3-7       MASS_7.3-29      survival_2.37-4 
## [21] foreign_0.8-57   deSolve_1.10-8  
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1     colorspace_1.2-4   dichromat_2.0-0   
##  [4] digest_0.6.4       evaluate_0.5.1     formatR_0.10      
##  [7] gtable_0.1.2       httr_0.2           labeling_0.2      
## [10] memoise_0.1        munsell_0.4.2      parallel_3.0.2    
## [13] plyr_1.8           proto_0.3-10       RColorBrewer_1.0-5
## [16] Rcpp_0.11.0        RCurl_1.95-4.1     reshape2_1.2.2    
## [19] scales_0.2.3       tools_3.0.2
Bye and regards.

To leave a comment for the author, please follow the link and comment on their blog: MATHEMATICS IN MEDICINE.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)