A way of creating clear, transparent, and unified data visualizations

[This article was first published on R in ResponsibleML on Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How to create appropriate data visualizations using tidycharts package.

There is a wide range of R packages created for data visualization, but still, something was lacking. There was no simple and easily accessible way to create clean and transparent charts — up to this day! tidychartsguarantees that your charts will be appropriate and consistent with each other. Furthermore, we assure you that you won’t have to worry if your charts are transparent and tidy enough because tidycharts already took care of it for you by following International Business Communication Standards rules.

What is IBCS exactly?

The IBCS Association is an open, not-for-profit organization that supports promoting, maintaining, and further developing the International Business Communication Standards (IBCS®). 1.0 version of IBCS was published in 2013 by Rolf Hichert and Jürgen Faisst. Since 2017 1.1 version is available.

Standards contain practical proposals for the design of business communication. The main goal is to design charts in a proper conceptual, perceptual and semantic way. To achieve this objective, the IBCS creators proposed following the SUCCESS rules — an acronym that stands for Say, Unify, Condense, Check, Express, Simplify, Structure.

You can use the charts later in reports, presentations, and dashboards.

What the package has to offer?

We implemented chart generating functions for the most frequently used types of plots. The package returns the charts in .svg format, which carries many benefits as a transparent background and no loss in the image quality in zooming. Our package includes:

  • column bar charts (basic, aggregated, normalized, referenced, grouped)
  • horizontal bar charts (basic, aggregated, normalized, referenced, grouped)
  • line plots (basic, with markers, aggregated, normalized, referenced, with chosen points highlighted)
  • scatter and bubble plots

Additionally, we added a function that will help you custom your plots and make generating plots for reports easier:

  • making your own pallet of colors for charts
  • showing your charts in a grid, next to each other

Installation

For now, the tidycharts package isn’t yet available on CRAN but we encourage you to download it anyway! Simply run the following command:

devtools::install_github("MI2DataLab/tidycharts")

When the package will be finally available, the installation is even more effortless — it proceeds like every other R package installation. Just run the following command and wait for the library to load.

install.packages("tidycharts")
library(tidycharts)

Usage

Let’s say we want to create a series bar chart to show products and services sales in different European cities. First, we need to prepare a data frame.

#prepare the data frame
 data <- data.frame(
 city = c("Berlin", "Munich", "Cologne", "London", "Vienna", "Paris", "Zurich"),
 Products = c(538, 250, 75, 301, 227, 100, 40),
 Services = c(621, 545, 302, 44, 39, 20, 34)
)

Next, we need to generate the plot

#generate 
barchart <- bar_chart(data, data$city, c("Products", "Services"), c("Products", "Services"))
#show the plot
barchart 

This is the final result

A series bar chart generated using the tidycharts package

Let’s see one more example using a well known iris data table

scatter <- scatter_plot(iris, iris$Sepal.Length, iris$Sepal.Width, iris$Species, 1, 0.5, c("sepal length", "in cm"), c("sepal width", "in cm"), "Legend")
scatter 
A scatter plot generated using the tidycharts package

Customizing your plots

IBCS advises using various shades of grey instead of other colors, but we left it up to the users what colors to use. The grey pallet is the default one, but you can change it by calling the set_colors() function.
Let’s see an example.

#before the customization
data_time_series <- data.frame(
  time = month.abb[1:8],
  Poland = round(2 + 0.5 * sin(1:8), 1),
  Germany = round(3 + sin(3:10), 1),
  Slovakia = round(2 + 2 * cos(1:8), 1)
)
column_chart(data_time_series, x = 'time', 
             series = c('Poland', 'Germany', 'Slovakia'), interval = 'months')
A series column chart with default colors.

Now let’s use the set_colors() function

#changing the colors
color_df <- data.frame(
  bar_colors = c("rgb(61, 56, 124)", "rgb(0,89,161)", "rgb(0,120,186)", "rgb(0,150,193)", "rgb(0, 178, 184)", "rgb(0,178,184)"),
  text_colors = c("white", "white", "white", "white", "white", "black")
)
set_colors(color_df)

Generating the plot again

column_chart(data_time_series, x = 'time', 
           series = c('Poland', 'Germany', 'Slovakia'), interval = 'months')
A series column chart with custom colors

Later you can always switch to default options by calling the restore_defaults() function.

Gluing the plots together

In writing any reports, you may require to show many plots next to each other. tidycharts has a function perfect for that! Let’s say you want to see correlations between various variables and penguin species in the palmer penguins data table. Scatter plots are an ideal tool for that!

First, let’s load the necessary libraries and drop NA values.

library(palmerpenguins)
library(tidyverse)
p <- penguins %>%
  drop_na(bill_length_mm, flipper_length_mm, bill_length_mm, body_mass_g)

Next comes generating the svg strings with scatter plots

#--- bill length on the x-axis ---
scatter1 <-
  scatter_plot(
    p,
    p$bill_length_mm,
    p$bill_depth_mm,
    p$species,
    x_names = c("bill length", "in mm"),
    y_names = c("bill depht", "in mm")
  )
scatter2 <-
  scatter_plot(
    p,
    p$bill_length_mm,
    p$flipper_length_mm,
    p$species,
    x_names = c("bill length", "in mm"),
    y_names = c("flipper length", "in mm")
  )
scatter3 <-
  scatter_plot(
    p,
    p$bill_length_mm,
    p$body_mass_g,
    p$species,
    x_names = c("bill length", "in mm"),
    y_names = c("body mass", "in g")
  )

Finally, join the plots together and show the plot

join_charts(scatter1, scatter2, scatter3,
            nrows=1, ncols=3)
Joined scatter plots showing the relationship between the variables in palmer penguins dataset

Summary

We kindly encourage you to try out the tidycharts package and start your journey with data visualizations and exploring the package’s possibilities.

If you are also interested in posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.


A way of creating clear, transparent, and unified data visualizations was originally published in ResponsibleML on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: R in ResponsibleML on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)