Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Summary

Convert the elements of a numerical vector or data frame column to character strings in which the numbers are formatted using powers-of-ten notation in scientific or engineering form and delimited for rendering as inline equations in an rmarkdown document.

Initial release of the formatdown R package providing tools for formatting output in rmarkdown or quarto markdown documents.

This first version has one function only, format_power(), for converting numbers to character strings formatted in powers-of-ten notation and delimited in $...$ for rendering as inline equations in .Rmd or .qmd output documents. Provides two powers-of-ten formatting options—scientific notation and engineering notation—with an option to omit powers-of-ten notation for a specified range of exponents.

To illustrate the different formats, I show in Table 1 the same number rendered using different formats, all with 4 significant digits.

The R code for the post is listed under the “R code” pointers. In the examples, I use data.table syntax for data manipulation, though the code can be translated into base R or dplyr syntax if desired.

R code
library("formatdown")
library("data.table")

x <- 4.567E-4                                   # value
x1 <- format_power(x, 4, omit_power = c(-6, 0)) # omit power-of-ten
x2 <- format_power(x, 4, format = "sci")        # scientific
x3 <- format_power(x, 4)                        # engineering

# render in markdown table below
Table 1: Rendering a number using different formats
Notation Name Value Rendered as
without $10^n$ x1 "$0.0004567$" $0.0004567$
scientific x2 "$4.567\\times{10}^{-4}$" $4.567\times{10}^{-4}$
engineering x3 "$456.7\\times{10}^{-6}$" $456.7\times{10}^{-6}$

## Background

My first attempt to provide powers-of-ten formatting was in my 2016 package, docxtools. That implementation has several shortcomings.

I wrote its formatting function to accept a data frame as input, which entailed a lot of programming overhead to separate numerical from non-numerical variable classes and to reassemble them after the numerical columns were formatted. This could have been simplified with judicious use of lapply(), with which I was not sufficiently experienced at the time. I also failed to take advantage of formatC() in constructing the output.

With formatdown, my goal is to provide similar functionality but with more concise code, greater flexibility, and a more balanced approach to package dependencies.

## Improvements

The primary design change is that the format_power() function operates on a numerical vector instead of a data frame. The benefits of this change are: 1) simpler code that should be easier to revise and maintain; 2) scalar values can be formatted for rendering inline; and 3) data frames can still be formatted, by column, using lapply().

To illustrate formatting a scalar value inline, the markup for Avogadro’s number (x = 6.0221E+23) in engineering format is given by,

    $N_A =$ r format_power(x, digits = 5, format = "engr")

which is rendered (in this output document) as $N_A =$ $602.21\times{10}^{21}$.

The second improvement is the addition of an option for scientific notation. For example, the markup for Avogadro’s number in scientific notation is given by,

    $N_A =$ r format_power(x, digits = 5, format = "sci")

which renders as $N_A =$ $6.0221\times{10}^{23}$.

The third improvement is the addition of an option for omitting powers-of-ten notation over a range of exponents. For example, the markup for x = 1.23E-4 in decimal notation is given by,

    $x =$ r format_power(x = 1.234E-4, omit_power = c(-4, 0))

which renders as $x =$ $0.000123$.

A final (internal) improvement is a more balanced approach to package dependencies. With a tighter focus on what formatdown is to accomplish compared to docxtools, I have reduced the dependencies to checkmate, wrapr, and data.table.

The package vignette illustrates package usage in detail.

However, having successfully submitted the package to CRAN, I started working on this post and immediately (!) uncovered an issue that had not appeared while working on the package vignettes.

## Delimiter issue

I wrote the package vignette using the rmarkdown::html_vignette output style per usual. All the formatted output rendered as expected in that document. I write this blog using quarto. As seen in the examples above, inline math is rendered as expected.

The issue arises when using knitr::kable() and kableExtra::kbl() to display data tables in this blog post. To illustrate, consider this data frame, included with formatdown (ideal gas properties of air at room temperature).

R code
density
         date  trial humidity    T_K   p_Pa     R  density
<Date> <char>   <fctr>  <num>  <num> <int>    <num>
1: 2018-06-12      a      low 294.05 101100   287 1.197976
2: 2018-06-13      b     high 294.15 101000   287 1.196384
3: 2018-06-14      c   medium 294.65 101100   287 1.195536
4: 2018-06-15      d      low 293.35 101000   287 1.199647
5: 2018-06-16      e     high 293.85 101100   287 1.198791

Formatting the pressure column, the markup looks OK.

R code
DT <- copy(density)
DT$p_Pa <- format_power(DT$p_Pa, 4)
DT
         date  trial humidity    T_K                   p_Pa     R  density
<Date> <char>   <fctr>  <num>                 <char> <int>    <num>
1: 2018-06-12      a      low 294.05 $101.1\\times{10}^{3}$   287 1.197976
2: 2018-06-13      b     high 294.15 $101.0\\times{10}^{3}$   287 1.196384
3: 2018-06-14      c   medium 294.65 $101.1\\times{10}^{3}$   287 1.195536
4: 2018-06-15      d      low 293.35 $101.0\\times{10}^{3}$   287 1.199647
5: 2018-06-16      e     high 293.85 $101.1\\times{10}^{3}$   287 1.198791

knitr::kable() yields the expected output with pressure formatted in engineering notation.

R code
knitr::kable(DT, align = "r")
date trial humidity T_K p_Pa R density
2018-06-12 a low 294.05 $101.1\times{10}^{3}$ 287 1.197976
2018-06-13 b high 294.15 $101.0\times{10}^{3}$ 287 1.196384
2018-06-14 c medium 294.65 $101.1\times{10}^{3}$ 287 1.195536
2018-06-15 d low 293.35 $101.0\times{10}^{3}$ 287 1.199647
2018-06-16 e high 293.85 $101.1\times{10}^{3}$ 287 1.198791

### Problem

kableExtra::kbl() does not render the math markup as expected.

R code
kableExtra::kbl(DT, align = "r")
date trial humidity T_K p_Pa R density
2018-06-12 a low 294.05 $101.1\times{10}^{3}$ 287 1.197976
2018-06-13 b high 294.15 $101.0\times{10}^{3}$ 287 1.196384
2018-06-14 c medium 294.65 $101.1\times{10}^{3}$ 287 1.195536
2018-06-15 d low 293.35 $101.0\times{10}^{3}$ 287 1.199647
2018-06-16 e high 293.85 $101.1\times{10}^{3}$ 287 1.198791

In fact, having loaded kableExtra above, knitr::kable() now fails in the same way.

R code
knitr::kable(DT, align = "r")
date trial humidity T_K p_Pa R density
2018-06-12 a low 294.05 $101.1\times{10}^{3}$ 287 1.197976
2018-06-13 b high 294.15 $101.0\times{10}^{3}$ 287 1.196384
2018-06-14 c medium 294.65 $101.1\times{10}^{3}$ 287 1.195536
2018-06-15 d low 293.35 $101.0\times{10}^{3}$ 287 1.199647
2018-06-16 e high 293.85 $101.1\times{10}^{3}$ 287 1.198791

### Solution

I found a suggestion from MathJax to replace the $...$ delimiters with \$$... \$$. I wrote a short function (below) to do that.

R code
# Substitute math delimiters
sub_delim <- function(x) {
x <- sub("\\$", "\\\$$", x) # first x <- sub("\\", "\\\$$", x) # second$
}

DT$p_Pa <- sub_delim(DT$p_Pa)
DT
         date  trial humidity    T_K                       p_Pa     R  density
<Date> <char>   <fctr>  <num>                     <char> <int>    <num>
1: 2018-06-12      a      low 294.05 \$$101.1\\times{10}^{3}\$$   287 1.197976
2: 2018-06-13      b     high 294.15 \$$101.0\\times{10}^{3}\$$   287 1.196384
3: 2018-06-14      c   medium 294.65 \$$101.1\\times{10}^{3}\$$   287 1.195536
4: 2018-06-15      d      low 293.35 \$$101.0\\times{10}^{3}\$$   287 1.199647
5: 2018-06-16      e     high 293.85 \$$101.1\\times{10}^{3}\$$   287 1.198791

knitr::kable() yields the expected output.

R code
knitr::kable(DT, align = "c")
date trial humidity T_K p_Pa R density
2018-06-12 a low 294.05 $$101.1\times{10}^{3}$$ 287 1.197976
2018-06-13 b high 294.15 $$101.0\times{10}^{3}$$ 287 1.196384
2018-06-14 c medium 294.65 $$101.1\times{10}^{3}$$ 287 1.195536
2018-06-15 d low 293.35 $$101.0\times{10}^{3}$$ 287 1.199647
2018-06-16 e high 293.85 $$101.1\times{10}^{3}$$ 287 1.198791

kableExtra::kbl() yields the expected output.

R code
kableExtra::kbl(DT, align = "c")
date trial humidity T_K p_Pa R density
2018-06-12 a low 294.05 $$101.1\times{10}^{3}$$ 287 1.197976
2018-06-13 b high 294.15 $$101.0\times{10}^{3}$$ 287 1.196384
2018-06-14 c medium 294.65 $$101.1\times{10}^{3}$$ 287 1.195536
2018-06-15 d low 293.35 $$101.0\times{10}^{3}$$ 287 1.199647
2018-06-16 e high 293.85 $$101.1\times{10}^{3}$$ 287 1.198791

I can use the features from kableExtra to print a pretty table.

R code
library("kableExtra")

var_names <- c("Date", "Trial", "Humidity", "Temperature", "Pressure", "Gas constant", "Density" )
var_units <- c("", "", "", "[K]", "[Pa]", "[J/(kg K)]", "[kg/m\$$^3\$$]")
var_align <- "r"

DT |>
kbl(align = var_align, col.names = var_units) |>
column_spec(1:6, color = "black", background = "white") |>
kable_paper(lightable_options = "basic", full_width = TRUE)
Table 2: Data frame displayed using kableExtra
Date
Trial
Humidity
Temperature
Pressure
Gas constant
Density
[K] [Pa] [J/(kg K)] [kg/m$$^3$$]
2018-06-12 a low 294.05 $$101.1\times{10}^{3}$$ 287 1.197976
2018-06-13 b high 294.15 $$101.0\times{10}^{3}$$ 287 1.196384
2018-06-14 c medium 294.65 $$101.1\times{10}^{3}$$ 287 1.195536
2018-06-15 d low 293.35 $$101.0\times{10}^{3}$$ 287 1.199647
2018-06-16 e high 293.85 $$101.1\times{10}^{3}$$ 287 1.198791

To address this issue, the next version of format_power() will include a new delim argument,

    format_power(x, digits, format, omit_power, delim)

that allows a user to set the math delimiters to $...$ or \$$... \$$ or even custom left and right markup to suit their environment.

## Fixed exponents

Preparing this post, I adapted a table of water properties from the hydraulics package to use as an example and discovered another, more subtle issue. First, I’ll construct the data frame.

R code
# Construct a table of water properties
temperature     <- seq(0, 45, 10) + 273.15
density         <- c(1000, 1000, 998, 996, 992)
specific_weight <- c(9809, 9807, 9793, 9768, 9734)
viscosity       <- c(173, 131, 102, 81.7, 67.0) * 1E-8
bulk_modulus    <- c(202, 210, 218, 225, 228) * 1E+7

water <- data.table(temperature, density, specific_weight, viscosity,  bulk_modulus)

water
   temperature density specific_weight viscosity bulk_modulus
<num>   <num>           <num>     <num>        <num>
1:      273.15    1000            9809  1.73e-06     2.02e+09
2:      283.15    1000            9807  1.31e-06     2.10e+09
3:      293.15     998            9793  1.02e-06     2.18e+09
4:      303.15     996            9768  8.17e-07     2.25e+09
5:      313.15     992            9734  6.70e-07     2.28e+09

### Problem

I format all the columns and change the delimiters as described earlier and display the result. The viscosity column reveals the problem.

R code
DT <- copy(water)

# 5 signif digits
cols_to_format <- c("temperature")
DT[, (cols_to_format) := lapply(.SD, function(x) format_power(x, 5)), .SDcols = cols_to_format]

# 4 signif digits
cols_to_format <- c("specific_weight")
DT[, (cols_to_format) := lapply(.SD, function(x) format_power(x, 4)), .SDcols = cols_to_format]

# 3 signif digits
cols_to_format <- c("viscosity", "bulk_modulus")
DT[, (cols_to_format) := lapply(.SD, function(x) format_power(x)), .SDcols = cols_to_format]

# 3 signif digits omit powers
cols_to_format <- c("density")
DT[, (cols_to_format) := lapply(.SD, function(x) format_power(x, omit_power = c(0, 3))), .SDcols = cols_to_format]

# change the delimiters
DT <- DT[, lapply(.SD, function(x) sub_delim(x))]

# Table
DT |>
kbl(align = "cclrrrr") |>
kable_paper(lightable_options = "basic", full_width = TRUE) |>
row_spec(0, background = "#c7eae5") |>
column_spec(1:5, color = "black", background = "white")
temperature density specific_weight viscosity bulk_modulus
$$273.15$$ $$1000$$ $$9.809\times{10}^{3}$$ $$1.73\times{10}^{-6}$$ $$2.02\times{10}^{9}$$
$$283.15$$ $$1000$$ $$9.807\times{10}^{3}$$ $$1.31\times{10}^{-6}$$ $$2.10\times{10}^{9}$$
$$293.15$$ $$998$$ $$9.793\times{10}^{3}$$ $$1.02\times{10}^{-6}$$ $$2.18\times{10}^{9}$$
$$303.15$$ $$996$$ $$9.768\times{10}^{3}$$ $$817\times{10}^{-9}$$ $$2.25\times{10}^{9}$$
$$313.15$$ $$992$$ $$9.734\times{10}^{3}$$ $$670\times{10}^{-9}$$ $$2.28\times{10}^{9}$$

The viscosity column displays three values using $10^{-6}$ and two using $10^{-9}$. Visually comparing the values in a column is easier if the powers of ten are identical. The table below illustrates the desired result, created by manually editing the two viscosity values.

R code
# Manually edit strings to illustrate
DT$viscosity[4] <- "\$$0.82\\times{10}^{-6}\$$" DT$viscosity[5] <- "\$$0.67\\times{10}^{-6}\$$"

# Table
DT |>
kbl(align = "cclrrrr") |>
kable_paper(lightable_options = "basic", full_width = TRUE) |>
row_spec(0, background = "#c7eae5") |>
column_spec(1:5, color = "black", background = "white")
temperature density specific_weight viscosity bulk_modulus
$$273.15$$ $$1000$$ $$9.809\times{10}^{3}$$ $$1.73\times{10}^{-6}$$ $$2.02\times{10}^{9}$$
$$283.15$$ $$1000$$ $$9.807\times{10}^{3}$$ $$1.31\times{10}^{-6}$$ $$2.10\times{10}^{9}$$
$$293.15$$ $$998$$ $$9.793\times{10}^{3}$$ $$1.02\times{10}^{-6}$$ $$2.18\times{10}^{9}$$
$$303.15$$ $$996$$ $$9.768\times{10}^{3}$$ $$0.82\times{10}^{-6}$$ $$2.25\times{10}^{9}$$
$$313.15$$ $$992$$ $$9.734\times{10}^{3}$$ $$0.67\times{10}^{-6}$$ $$2.28\times{10}^{9}$$

This revision satisfies two conventions of tabulating empirical engineering information.

1. Units.   With all the reported values reported to the same power-of-ten, the units can all be interpreted in the same way. In this case for example, the units of the viscosity coefficients (1.73, 1.31, etc.) are all micro-Pascal-seconds ($\mu$Pa-s).

2. Uncertainty.   In rewriting the two viscosity values, I changed from three significant digits to two decimal places, consistent with the assumption that empirical information is reported to the same level of uncertainty unless noted otherwise.

### Potential revision

Add the water data to formatdown and the following functionality to format_power().

1. A new argument (perhaps fixed_power) that automatically selects a fixed exponent for a numerical vector or permits the user to directly assign a fixed exponent.

 format_power(x, digits, format, omit_power, delim, fixed_power)
2. In conjunction with the fixed power-of-ten, I would also round all numbers in a column to the same number of decimal places to address the uncertainty assumption. This could be a separate argument.

## Units

And now for something completely different!

Thinking about measurement units, I looked for relevant R packages and found units. With appropriate units, powers-of-ten notation can be practically eliminated. For example, a pressure reading of $2.02\times{10}^{9}$ Pa can be reported as $2.02$ GPa.

R code
water
   temperature density specific_weight viscosity bulk_modulus
<num>   <num>           <num>     <num>        <num>
1:      273.15    1000            9809  1.73e-06     2.02e+09
2:      283.15    1000            9807  1.31e-06     2.10e+09
3:      293.15     998            9793  1.02e-06     2.18e+09
4:      303.15     996            9768  8.17e-07     2.25e+09
5:      313.15     992            9734  6.70e-07     2.28e+09

With tools from the units package, I can define a symbol uP to represent micropoise (a non-SI viscosity unit equal to 10$^{-7}$ Pa-s). And I can write a short function to convert the numbers from basic units to displayed units, for example, converting Pa to GPa (gigapascal) or Pa-s to $\mu$P (micropoise).

R code
library("units")

# Define the uP units
install_unit("uP", "micropoise", "micropoise")

# Function to assign and convert units
assign_units <- function(x, base_unit, display_unit) {

# convert x to "Units" class in base units
units(x) <- base_unit

# convert from basic to display units
units(x) <- as_units(display_unit)

# return value
x
}

Convert each column and output the results.

R code
# Apply to one variable at a time
DT <- copy(water)
DT$temperature <- assign_units(DT$temperature, "K", "degree_C")
DT$density <- assign_units(DT$density, "kg/m^3", "kg/m^3")
DT$specific_weight <- assign_units(DT$specific_weight, "N/m^3", "kN/m^3")
DT$viscosity <- assign_units(DT$viscosity, "Pa*s", "uP")
DT$bulk_modulus <- assign_units(DT$bulk_modulus, "Pa", "GPa")

# Output
DT |>
kbl(align = "r") |>
kable_paper(lightable_options = "basic", full_width = TRUE) |>
row_spec(0, background = "#c7eae5") |>
column_spec(1:5, color = "black", background = "white") 
temperature density specific_weight viscosity bulk_modulus
0 [°C] 1000 [kg/m^3] 9.809 [kN/m^3] 17.30 [uP] 2.02 [GPa]
10 [°C] 1000 [kg/m^3] 9.807 [kN/m^3] 13.10 [uP] 2.10 [GPa]
20 [°C] 998 [kg/m^3] 9.793 [kN/m^3] 10.20 [uP] 2.18 [GPa]
30 [°C] 996 [kg/m^3] 9.768 [kN/m^3] 8.17 [uP] 2.25 [GPa]
40 [°C] 992 [kg/m^3] 9.734 [kN/m^3] 6.70 [uP] 2.28 [GPa]

The entries in the data frame are still numeric but are of the “Units” class, enabling math operations among values with compatible units. See the units website for details.

R code
str(DT)
Classes 'data.table' and 'data.frame':  5 obs. of  5 variables:
$temperature : Units: [°C] num 0 10 20 30 40$ density        : Units: [kg/m^3] num  1000 1000 998 996 992
$specific_weight: Units: [kN/m^3] num 9.81 9.81 9.79 9.77 9.73$ viscosity      : Units: [uP] num  17.3 13.1 10.2 8.17 6.7
\$ bulk_modulus   : Units: [GPa] num  2.02 2.1 2.18 2.25 2.28
- attr(*, ".internal.selfref")=<externalptr> 

If I were to refine this table further, I would report the numerical values without labels in each cell, moving the unit labels to a sub-header row. Possible future work.

### Potential revision

Incorporate tools from the units package to create a new function (perhaps format_units()) that would convert basic units to display units that can substitute for powers-of-ten notation.

## Closing

The new formatdown package formats numbers in powers-of-ten notation for inline math markup. A new argument is already in the works for managing the math delimiters. Potential new features include a fixed power-of-tens option as well as replacing powers-of-ten notation with deliberate manipulation of physical units.