Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve now down a few collages from R using magick: the faces of #rstats Twitter, We R-Ladies with Lucy D’Agostino McGowan, and a holiday card for R-Ladies. The faces of #rstats Twitter and holiday card collages were arranged at random, while the We R-Ladies one was a mosaic forming the R-Ladies logo. I got the idea to up my collage skills by trying to learn how to arrange pics by their main colour, like a rainbow. The verb rainbow doesn’t exist, and “rainbowing” doesn’t mean ordering by colour, but I didn’t let this stop me.

It was the occasion to grab some useful knowledge about colours, not useless for someone who did not even know about Pantone’s Colors of the Year a few weeks ago…

This post has nothing to do with Kesha’s new album. However, you can listen to it while reading since it’s so good, but maybe switch to something older from her when I use “$”. # Getting some pics to play with The first pictures I tried to arrange were all the pictures ever posted by R-Ladies local chapters on their Twitter account. While it was fun to grab them all, it was not very interesting to play with them as so many of them were pictures of screens. I therefore grabbed “nature” pictures from Pexels using the same method [as when creating the Bubblegum Puppies] following this Stack Overflow thread. I chose “nature” as a keyword because 1) it lead to many hits 2) it offered a good variety of colours. library("rvest") library("RSelenium") library("magrittr") rD <- rsDriver() remDr <- rD[["client"]] # open the webpage remDr$navigate("https://www.pexels.com/search/nature/")

# scroll down
for(i in 1:130){
remDr$executeScript(paste("scroll(0,",i*10000,");"), args = list("dummy")) # be nice and wait Sys.sleep(1) } # https://www.pexels.com/faq/ page_content <- remDr$getPageSource()
(500-newinfo$height)/2)) info <- magick::image_info(image) # odd numbers out! if(info$height/2 != floor(info$height/2)){ image <- magick::image_crop(image, "0x500+0") } if(info$width/2 != floor(info$width/2)){ image <- magick::image_crop(image, "500x0+0") } magick::image_write(image, stringr::str_replace(path, "nature", "formatted_pics")) tibble::tibble(path = path, colour = colour) }else{ NULL } } pics_main_colours <- purrr::map_df(dir("nature", full.names = TRUE), format_image) readr::write_csv(pics_main_colours, path = "pics_main_colours.csv")  And because I’m apparently a bad planner, I had to reduce pictures afterwards. # we need smaller images reduce_image <- function(path){ magick::image_read(path) %>% magick::image_scale("50x50!") %>% magick::image_write(path) } purrr::walk(dir("formatted_pics", full.names = TRUE), reduce_image)  # Preparing a function to order and paste pictures This function has a collage part which you might recognize from my the faces of #rstats Twitter blog post, and a ordering pictures according to a variable part that’s new and uses a bit of tidy eval… Maybe I’ll really learn tidy eval this year! pics_info needs to be a data.frame with the path to pictures and well the variable one wants to use to order them. library("rlang") make_column <- function(i, files, no_rows){ magick::image_read(files[(i*no_rows+1):((i+1)*no_rows)]) %>% magick::image_append(stack = TRUE) %>% magick::image_write(paste0("cols/", i, ".jpg")) } make_collage <- function(pics_info, no_rows, no_cols, ordering_col){ pics_info <- dplyr::arrange(pics_info, !!!syms(ordering_col)) pics_info <- pics_info[1:(no_rows*no_cols),] pics_info <- dplyr::mutate(pics_info, column = rep(1:no_cols, each = no_rows)) pics_info <- dplyr::group_by(pics_info, column) %>% dplyr::arrange(!!!syms(ordering_col)) %>% dplyr::mutate(row = 1:no_rows) %>% dplyr::ungroup() pics_info <- dplyr::arrange(pics_info, column, row) dir.create("cols") purrr::walk(0:(no_cols-1), make_column, files = pics_info$path,
no_rows = no_rows)

banner <- magick::image_read(dir("cols/", full.names = TRUE)) %>%
magick::image_append(stack = FALSE)

return(banner)

}



The function returns a magick object that one can then write to disk as a PNG for instance.

I first tested it using a random approach added to the data.frame created in the next section, and show the result here to give an idea of the variety of pictures. For many of them, however, the main colour that you can see in their border is greyish.

set.seed(42)
pics_info <- dplyr::mutate(pics_info, random = sample(1:nrow(pics_info), nrow(pics_info)))
make_collage(pics_info, 19, 59, "random") %>%
magick::image_write("data/2018-01-07-rainbowing-banner_random.png")


# Testing a first (bad) approach: using hue

Once I had the main colour as an hex code, I had no idea how to order the colours and thought a good idea would be to use hue, which is the main wave length in a colour. Most observed colours are a mix of wave lengths unless you’re using a laser for instance. To get hue from a colour identified by its hex code, one needs two functions: colorspace::hex2rgb and grDevices::rgb2hsv. The latter one outputs hue, saturation and value. Hue is the main wavelength, saturation the amount of that wavelength in the colour and value the amount of light in the colour. The smaller the saturation, the less representative the hue is of the main colour. Add to that the fact that the main colour can also be only a little representative of your original picture… Ordering by hue isn’t too perfect, but I tried that anyway.

# now work on getting the hue and value for all pics
# create a data.frame with path, hue, value
get_values <- function(path, pics_main_colours){
print(path)
# get main color
colour <- pics_main_colours$colour[pics_main_colours$path == stringr::str_replace(path,
"formatted_pics",
"nature")]

# translate it
rgb <- colorspace::hex2RGB(colour)
values <- grDevices::rgb2hsv(t([email protected]))

tibble::tibble(path = path,
hue = values[1,1],
saturation = values [2,1],
value = values[3,1])
}

# all values
pics_col <- purrr::map_df(dir("formatted_pics", full.names = TRUE),
get_values, pics_main_colours)

make_collage(pics_info, 19, 59, "hue") %>%
magick::image_write("banner_hue.png")



So this is not too pretty. Blue and green pictures seem to cluster together but there are very dark pictures which we’d intuitively put aside.

So I thought a bit more and decided to first assign main colours to a reference colour and then order pictures based on this…

# Choosing a better approach: RGB and distances

The first challenge was to choose reference colours which’d be a rainbow slices. I could have looked up RGB codes corresponding to ROYGBIV (red, orange, yellow, green, blue, indigo and violet.) but I had read about xkcd colors survey in this interesting post and therefore decided to use XKCD colors, available in R via the xkcdcolors package. I chose to use the 18 most common ones, and add black to that lot. It was no longer really a rainbow, I agree. The colors present in the pictures were ordered by hand by my husband, and I like his choices.

Then to assign each pic to a reference colour via its main colour, I calculated the Euclidian distance between that colour and all reference colours to find the closes reference colours, using the RGB values.

library("xkcdcolors")
library("magrittr")
# version of colorspace::hex2RGB returning a df
hex2rgb <- function(hex){
result <- colorspace::hex2RGB(hex)@coords
}

# https://stackoverflow.com/questions/45328221/unnest-one-column-list-to-many-columns-in-tidyr
colors <- tibble::tibble(name = c(xcolors()[1:18], "black"),
hex = name2color(name),
rgb = purrr::map(hex, hex2rgb)) %>%
dplyr::mutate(rgb = purrr::map(rgb, tibble::as_tibble)) %>%
tidyr::unnest()

# for each colour I want the closest one.
find_closest_colour <- function(hex, colors){
test <- tibble::tibble(hex = hex) %>%
dplyr::mutate(rgb = purrr::map(hex, hex2rgb),
rgb = purrr::map(rgb, tibble::as_tibble)) %>%
tidyr::unnest()

distance <- stats::dist(rbind(test[, c("R", "G", "B")],
colors[, c("R", "G", "B")]))

colors\$name[which.min(as.matrix(distance)[,1][2:(nrow(colors) + 1)])]
}

imgs_col <- dplyr::mutate(pics_main_colours,
xkcd_col = purrr::map_chr(colour, find_closest_colour,
colors = colors))



Once I had this information about each pic, I could order the pictures, after having defined the order of the reference colours.

pics_info <- readr::read_csv("imgs_xkcd_col.csv")
pics_info <- dplyr::mutate(pics_info,
path = stringr::str_replace(path, "nature", "formatted_pics"))

pics_info <- dplyr::mutate(pics_info,
xkcd_col = factor(xkcd_col, ordered = TRUE,
levels = c("black","brown","red","magenta","pink",
"lime green","green","dark green","teal",
"light blue","sky blue","blue","purple","grey")))



This looks much better, but of course the initial set (and maybe the used extraction method as well) don’t provide for enough colours to make this extremely pretty. I’m not sure how useful this end product is, but hey I got to look at pretty landscapes full of colours from my grey rainy city, and learnt a lot along the way… Besides, maybe you will find a cool use case of some of the colour methods featured here and will tell me about it in the comments?