[This article was first published on R – What You're Doing Is Rather Desperate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Highcharts has long been a favourite visualisation library of mine, and I’ve written before about Highcharter, my preferred way to use Highcharts in R.

Highcharter has a nice simple function, `hcboxplot()`, to generate boxplots. I recently generated some for a project at work and was asked: can we see how many observations make up the distribution for each category? This is a common issue with boxplots and there are a few solutions such as: overlay the box on a jitter plot to get some idea of the number of points, or try a violin plot, or a so-called bee-swarm plot. In Highcharts, I figured there should be a method to get the number of observations, which could then be displayed in a tool-tip on mouse-over.

There wasn’t, so I wrote one like this.

First, you’ll need to install `highcharter` from Github to make it work with the latest `dplyr`.

Next, we generate a reproducible dataset using the `wakefield` package. For some reason, we want to look at age by gender, but only for redheads:

```library(dplyr)
library(tidyr)
library(highcharter)
library(wakefield)
library(tibble)

set.seed(1001)
sample_data <- r_data_frame(
n = 1000,
age(x = 10:90),
gender,
hair
) %>%
filter(hair == "Red")

sample_data %>%
count(Gender)

## # A tibble: 2 x 2
##   Gender     n
##
## 1   Male    62
## 2 Female    48
```

Giving us 62 male and 48 female redheads. The `tibble` package is required because later on, our boxplot function calls the function `has_name` from that package.

The standard `hcboxplot` function shows us, on mouse-over, the summary data used in the boxplot, as in the image below.

```hcboxplot(x = sample_data\$Age, var = sample_data\$Gender) %>%
hc_chart(type = "column")
``` To replace that with number of observations per group, we need to edit the function. In RStudio, `View(hcboxplot)` will open a tab with the (read-only) code, which can be copy/pasted and edited. Look for the function named `get_box_values`, which uses the R `boxplot.stats` function to generated a data frame:

```  get_box_values <- function(x) {
boxplot.stats(x)\$stats %>% t() %>% as.data.frame() %>%
setNames(c("low", "q1", "median", "q3", "high"))
}
```

Edit it to look like this – the new function just adds a column `obs` with number of observations:

```get_box_values <- function(x) {
boxplot.stats(x)\$stats %>% t() %>% cbind(boxplot.stats(x)\$n) %>% as.data.frame() %>%
setNames(c("low", "q1", "median", "q3", "high", "obs"))
}
```

Save the new function as, for example, `my_hcboxplot`. Now we can customise the tooltip to use the `obs` property of the `point` object:

```my_hcboxplot(x = sample_data\$Age, var = sample_data\$Gender) %>%
hc_chart(type = "column") %>%
hc_tooltip(pointFormat = 'n = {point.obs}')
```

Voilà. Filed under: R, statistics  