5 Key Data Visualization Principles Explained – Examples in R

[This article was first published on Tag: r - Appsilon | Enterprise R Shiny Dashboards, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
5 Key Data Visualization Principles Explained Thumbnail

Data visualization can be tricky to do right. There are a ton of key principles you need to be aware of. Today we bring you 5 best practices for visualizing data with examples in R programming language. Incorporate these key R data visualization principles into your toolset to improve your data storytelling.

After reading, you’ll know how to produce publication-ready charts that won’t leave users questioning the data or the logic. You’ll know how to use ggplot2 and plotly for both static and interactive charts, and also how to get maximum interactivity out of your visualizations with R Shiny.

Want a deeper dive into visualization principles in R Shiny? Read our guide for bar plots.

These are the 5 key data visualization principles you must know


Don’t Manipulate with Axis Ranges

In the past, companies and individuals loved to exaggerate small and insignificant differences by manipulating axis ranges. For example, imagine a company had a profit of $100M in 2020 and $105M in 2021. In relative terms, that’s only a 5% increase – nothing to write home about – so the difference wouldn’t be immediately visible on a chart if the Y-axis range goes from 0 to 120 (Y-axis shows the profit).

What you could do – but shouldn’t – is to shorten the Y-axis range. A range between 99.5 and 105.5 would do the trick.

UX Design of Shiny apps is important. Follow these 7 steps to design dashboards for better results.

Let’s see the effect in action. Use the following code to declare a data.frame object containing profit for the mentioned two years:

With ggplot2, you can use the coord_cartesian(ylim = c(lower, upper)) to change the Y-axis range. Let’s set it to go from 99.5 to 105.5:

Image 1 - Bar chart with manipulatively formatted Y-axis

Image 1 – Bar chart with manipulatively formatted Y-axis

It looks like the difference is huge – easily 5-6 times higher than the year before. The chart doesn’t lie actually, but it doesn’t respect key data visualization principles. It’s easy to get the whole story wrong if you don’t look at the axis ticks.

The same chart looks nowhere near as impressive with the default Y-axis range:

Image 2 - Bar chart with normally formatted Y-axis

Image 2 – Bar chart with normally formatted Y-axis

Take-home point: Always read the axis ticks. Just because you’re obeying key data visualization principles, it doesn’t mean everyone else is.

Always Add Title and Axis Labels

A chart without a title and axis labels is pretty much useless. It might look great otherwise, but how can you know what you’re looking at? There’s no way to tell. Sure, you can describe the contents in the paragraph above, but that’s not a replacement. It’s only a supplement at best.

Luckily, ggplot2 makes it easy to obey this key data visualization principle. You can use the labs() function to add title, subtitle, caption, and axis labels, and you can use the theme() function to style them:

Image 3 - Bar chart with title, subtitle, caption, and axis labels

Image 3 – Bar chart with title, subtitle, caption, and axis labels

Not all charts need a subtitle and a caption, but we added them just for the fun. Every chart you make should include a title and axis labels at least.

Choose Appropriate and Appealing Color Palettes

There’s nothing worse than spending hours making the best out of your data but failing to make the chart visually appealing. We get it – not everyone has an eye for design. If you’re a software engineer, it’s likely you find design and aesthetics a nightmare. Similarly, if you’re a graphics designer, you’re able to design great-looking visuals – but can you implement them in code?

That’s where choosing an appropriate color palette comes in. The coolors.co is used and loved by many when it comes to picking a color palette.

Image 4 - coolors.co - A website for generating color palettes

Image 4 – coolors.co – A website for generating color palettes

It’s mostly used for entire websites and brand identities, but there’s no reason you can’t pick a single color you like (or multiple), and use it in your data visualizations.

The second one – Prussian Blue looks promising. Specify the fill parameter in the call to geom_bar() to change the color:

Image 5 - Chart with a single color bars

Image 5 – Chart with a single color bars

Sometimes, a single color won’t work. If your dataset has a categorical feature (e.g., day of the week, gender, age group), you can use it to color the bars or different chart segments. Simply set the fill parameter to the name of the dataset variable in the ggplot() function call:

Image 6 - Chart with a multiple color bars

Image 6 – Chart with a multiple color bars

The selected column of this dataset has only two features, but you get the gist. Color is a clear key data visualization principle. Master it and use it wisely.

Ditch 3D Charts – 2D is Plenty Enough

Take a look at the following three charts – Don’t worry, we haven’t created them, just picked them from the Internet:

Image 7 - Various 3D charts

Image 7 – Various 3D charts

What do they all have in common? You’ve guessed it – they all look horrible. Depth has no place in most data visualizations, especially not in those aimed at business users and the general public. Also, you can’t embed 3D visualizations in publications.

You can use depth, or Z-axis, when analyzing data yourself. After all, you know best what works for you – but that’s where the story should end.

Most users find the third dimension confusing for data visualization, and we get that. It’s easy to distort the data and come up with wrong insights. After all, everything is a matter of perspective. Two dimensions are just enough for 99.9% of the cases. If you want to convey extra information, consider changing the size or color of graph elements to accommodate for extra variables.

Make Your Charts Interactive – Go the Extra Mile for Better Data Visualizations

Probably the most important key data visualization principle and component is interactivity. There’s nothing wrong with static charts, especially if you’re just getting into data visualization, but interactivity will set you apart from the crowd.

The idea is that something should happen when you click or hover over a chart element. With bar charts, the most common thing you can do is to display the counts of the selected category.

Unfortunately, ggplot2 doesn’t support interactivity at this time. You’ll have to switch to some other alternative instead, like plotly. The syntax is a bit different, but you’ll quickly get the hang of it. Their documentation is superb, and you’ll find everything you need there.

Here’s how to “redraw” the chart from the previous sections in Plotly:

Image 8 - Interactive Plotly bar chart

Image 8 – Interactive Plotly bar chart

You can see how detailed data is shown automatically as you hover over individual bars. What gets displayed can be tweaked, but more on that some other time.

Do you know what really sets your visualizations from the crowd? You’ve guessed it – dashboards – at least in the interactivity department. For demonstration’s sake, we’ll declare a new dataset consisting of budgets across two departments in a two-year time span. The end-user can select the department on the dashboard, and the chart gets redrawn instantly. Take a look:

Image 9 - Interactive R Shiny application

Image 9 – Interactive R Shiny application

Embedding your visualizations into dashboards is light years ahead of everything you can do with a static graphing library. It allows for the most flexibility for the end-user, which is the only thing that matters in the long run.

It’s safe to say interactivity is among the most important key data visualization principles of 2022 and beyond.


Summary of Key Data Visualization Principles

Data visualization is one of those things that looks easy, but in reality, it’s easy to get wrong. A small error like forgetting to add axis labels can cost you a lot in the long run, especially if you can’t add it afterward.

Today you’ve learned five key principles of data visualization and got hands-on experience of visualizing data in R – with ggplot2, plotly, and shiny. It’s a lot to process for a single article, but we hope you managed to follow along.

If you want to dive deeper into data visualization with R, look no further than our in-depth guides:

The post 5 Key Data Visualization Principles Explained – Examples in R appeared first on Appsilon | Enterprise R Shiny Dashboards.

To leave a comment for the author, please follow the link and comment on their blog: Tag: r - Appsilon | Enterprise R Shiny Dashboards.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)