Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I was recently pointed in the direction of a thermal comfort model by the engineering company Arup (p27–28 of this pdf). Figure 3 at the top of p28 caught my attention.

It’s mostly a nice graph; there’s not too much junk in it. One thing that struck me was that there is an awful lot of information in the legend, and that I found it impossible to retain all that information while switching between the plot and the legend.

The best way to improve this plot then is to find a way to simplify the legend. Upon closer inspection, it seems that there is a lot of information that is repeated. For example, there are only two temperature combinations, and three levels of direct solar energy. Humidity and diffused solar energy are kept the same in all cases. That makes it really easy for us: our five legend options are

Outdoor temp (deg C) Direct solar energy (W/m^2)
32 700
32 150
32 500
29 500
29 150

Elsewhere we can explain that the mezannine/platform temps are always 2/4 degrees higher than outdoors, and that the humidity is always 50%, and that the diffused solar energy is always 100W/m^2.

Living in Buxton, one of the coldest, rainiest towns in the UK, it amuses me to see that their “low” outdoor temperature is 29°C.

The other thing to note is that we have two variables mapped to the hue. For just five cases, this is just about acceptable, but it isn’t the best option and it won’t scale to many more categories. It’s generally considered best practice to work in HCL color space when mapping variables to colours. I would be tempted to map temperature to hue – whether you pick red as hot and blue as cold or the other way around depends upon how many astronomers you have in your target audience. Then I’d map luminance (lightness) to solar energy: more sunlight = lighter line.

I don’t have the values to exactly recreate the dataset, but here are some made up numbers with the new legend. Notice the combined outdoor temp/direct solar energy variable.

time_points <- 0:27
n_time_points <- length(time_points)
n_cases <- 5
comfort_data <- data.frame(
time = rep.int(time_points, n_cases),
comfort = jitter(rep(-2:2, each = n_time_points)),
outdoor.temperature = rep(
c(32, 29),
times = c(3 * n_time_points, 2 * n_time_points)
),
direct.solar.energy = rep(
c(700, 150, 500, 500, 150),
each = n_time_points
)
)
comfort_data\$combined <- with(comfort_data,
factor(paste(outdoor.temperature, direct.solar.energy, sep = ", "))
)


We manually pick the colours to use in HCL space (using str_detect to examine the factor levels).

library(stringr)
cols <- hcl(
h = with(comfort_data, ifelse(str_detect(levels(combined), "29"), 0, 240)),
c = 100,
l = with(comfort_data,
ifelse(str_detect(levels(combined), "150"), 20,
ifelse(str_detect(levels(combined), "500"), 50, 80))
)
)


Drawing the plot is very straightforward, it’s just a line plot.

library(ggplot2)
p <- ggplot(comfort_data, aes(time, comfort, colour = combined)) +
geom_line(size = 2) +
scale_colour_manual(
name = expression(paste(
"Outdoor temp (", degree, C, "), Direct solar (", W/m^2, ")"
)),
values = cols) +
xlab("Time (minutes)") +
ylab("Comfort")
p


Sensible people should stop here, and write the additional detail in the figure caption. There is currently no sensible way of writing annotations outside of the plot area (annotate only works inside panels). The following hack was devised by Baptiste Auguie, read this forum thread for other variations.

library(gridExtra)
caption <- tableGrob(
matrix(
expression(
paste(
"Mezzanine temp is 2", degree, C, " warmer than outdoor temp"
),
paste(
"Platform temp is 4", degree, C, " warmer than outdoor temp"
),
paste("Humidity is always 50%"),
paste(
"Diffused solar energy is always 100", W/m^2
)
)
),
parse = TRUE,
theme = theme.list(
gpar.corefill = gpar(fill = NA, col = NA),
core.just = "center"
)
)
grid.arrange(p,  sub=caption)


Tagged: dataviz, ggplot2, keys, legends, r