Visualizing Categorical Data in R: A Guide with Engaging Charts Using the Iris Dataset

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Categorical data is a type of data that represents distinct groups or categories. Visualizing categorical data can provide valuable insights and help in understanding patterns and relationships within the data. In this blog post, we will explore three popular charts for visualizing categorical data in R using the iris dataset: geom_bar() from ggplot2, a grouped boxplot with base R and ggplot2, and a mosaic plot. We will explain each section of code in simple terms and encourage readers to try these charts on their own.

Examples

Example 1 Barplots with geom_bar() from ggplot2:

Barplots are a common and effective way to visualize categorical data. We can use the geom_bar() function from the ggplot2 package to create barplots in R. The geom_bar() function accepts a variable for the x-axis and plots the number of times each value of the variable appears in the dataset[3].

library(ggplot2)

# Create a barplot using geom_bar()
ggplot(data = iris, aes(x = Species, fill = factor(Species))) +
  geom_bar() +
  theme_minimal() +
  labs(
    title = "Bar Chart of Species Count",
    ylab = "Count",
    fill = "Species"
  )

Explanation: – Load the ggplot2 package using library(ggplot2). – The iris dataset is already available in R, so we can directly use it. – The aes() function specifies the aesthetic mappings, where x represents the variable on the x-axis. – The geom_bar() function creates the barplot.

Try creating a barplot with the Species variable from the iris dataset using the provided code. Experiment with different variables and datasets to explore the patterns and distributions within your data.

Example 2 Grouped Boxplot with base R and ggplot2

A grouped boxplot is a useful chart for comparing the distribution of a continuous variable across different categories of a categorical variable. We can create a grouped boxplot using both base R and ggplot2.

Base R

# Create a grouped boxplot using  base R
boxplot(Sepal.Length ~ Species, data = iris)

ggplot2

# Create a grouped boxplot using ggplot2
ggplot(data = iris, aes(x = Species, y = Sepal.Length,
                        fill = factor(Species))) +
  geom_boxplot() +
  theme_minimal() +
  labs(fill = "Species")

Explanation: – In base R, we use the boxplot() function to create a grouped boxplot. The formula Sepal.Length ~ Species specifies that the Sepal.Length variable should be plotted against the Species variable[2]. – In ggplot2, we use the geom_boxplot() function to create a grouped boxplot. The aes() function specifies the aesthetic mappings, where x represents the categorical variable and y represents the numeric variable.

Create a grouped boxplot with the Sepal.Length variable across different species in the iris dataset using either base R or ggplot2. Compare the distributions of Sepal.Length for each species and observe any differences.

Example 3 Mosaic Plot

A mosaic plot is a graphical representation of the relationship between two or more categorical variables. It displays the proportions of each category within the variables and allows for visual comparison.

mosaicplot(table(iris$Species, iris$Petal.Width))

Explanation: – The table() function creates a contingency table of the two variables, Species and Petal.Width, from the iris dataset. – The mosaicplot() function creates the mosaic plot.

Create a mosaic plot with the Species and Petal.Width variables from the iris dataset using the provided code. Explore the relationships and proportions between the variables. Experiment with different combinations of variables to gain insights from the mosaic plot.

Conclusion

Visualizing categorical data is essential for understanding patterns and relationships within the data. In this blog post, we explored three engaging charts for visualizing categorical data in R using the iris dataset: geom_bar() from ggplot2, a grouped boxplot with base R and ggplot2, and a mosaic plot. We explained each section of code in simple terms and encouraged readers to try these charts on their own. By experimenting with these charts, readers can gain valuable insights from their own categorical data and make informed decisions based on the visualizations.

Happy plotting!

Resources:

  • [1] https://community.rstudio.com/t/how-to-plot-categorical-data-in-r/21285
  • [2] https://gexijin.github.io/learnR/step-into-r-programmingthe-iris-flower-dataset.html
  • [3] https://www.statology.org/plot-categorical-data-in-r/
  • [4] https://www.r-bloggers.com/2022/01/handling-categorical-data-in-r-part-4/
  • [5] https://www.geeksforgeeks.org/how-to-plot-categorical-data-in-r/
  • [6] https://rpubs.com/odenipinedo/introduction-to-data-visualization-with-ggplot2
To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)