Creating Pareto Charts in R with the qcc Package

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

A Pareto chart is a type of bar chart that shows the frequency of different categories in a dataset, ordered by frequency from highest to lowest. It is often used to identify the most common problems or causes of a problem, so that resources can be focused on addressing them.

To create a Pareto chart in R, we can use the qcc package. The qcc package provides a number of functions for quality control, including the pareto.chart() function for creating Pareto charts.

Examples

Example 1: Creating a Pareto chart from a data frame

The following code shows how to create a Pareto chart from a data frame:

library(qcc)

# Create a data frame with the product and its count
df <- data.frame(
  product = c("Office desks", "Chairs", "Filing cabinets", "Bookcases"),
  count = c(100, 80, 70, 60)
)

# Create the Pareto chart
pareto.chart(df$count, main = "Pareto Chart of Product Sales")

   
Pareto chart analysis for df$count
    Frequency Cum.Freq. Percentage Cum.Percent.
  A 100.00000 100.00000   32.25806     32.25806
  B  80.00000 180.00000   25.80645     58.06452
  C  70.00000 250.00000   22.58065     80.64516
  D  60.00000 310.00000   19.35484    100.00000

This code will create a Pareto chart of the product sales, with the office desks bar at the top and the bookcases bar at the bottom. The cumulative percentage line is also plotted, which shows the percentage of total sales that each product accounts for.

Example 2: Creating a Pareto chart from a vector

We can also create a Pareto chart from a vector. The following code shows how to create a Pareto chart of the number of defects found in a manufacturing process:

# Create a vector with the number of defects found in each category
defects <- c(10, 8, 7, 6, 5)

# Create the Pareto chart
pareto.chart(defects, main = "Pareto Chart of Defects")

   
Pareto chart analysis for defects
    Frequency Cum.Freq. Percentage Cum.Percent.
  A  10.00000  10.00000   27.77778     27.77778
  B   8.00000  18.00000   22.22222     50.00000
  C   7.00000  25.00000   19.44444     69.44444
  D   6.00000  31.00000   16.66667     86.11111
  E   5.00000  36.00000   13.88889    100.00000

This code will create a Pareto chart of the number of defects found, with the most common defect category at the top and the least common defect category at the bottom. The cumulative percentage line is also plotted, which shows the percentage of total defects that each category accounts for.

Customizing the Pareto chart

We can customize the appearance of the Pareto chart using a number of arguments to the pareto.chart() function. For example, we can change the title of the chart, the labels of the x- and y-axes, the colors of the bars, and the line type of the cumulative percentage line.

The following code shows how to customize the Pareto chart from the first example:

# Create a data frame with the product and its count
df <- data.frame(
  product = c("Office desks", "Chairs", "Filing cabinets", "Bookcases"),
  count = c(100, 80, 70, 60)
)

# Create the Pareto chart
pareto.chart(
  df$count,
  main = "Pareto Chart of Product Sales",
  xlab = "Product",
  ylab = "Count",
  col = heat.colors(length(df$count)),
  lwd = 2
)

   
Pareto chart analysis for df$count
    Frequency Cum.Freq. Percentage Cum.Percent.
  A 100.00000 100.00000   32.25806     32.25806
  B  80.00000 180.00000   25.80645     58.06452
  C  70.00000 250.00000   22.58065     80.64516
  D  60.00000 310.00000   19.35484    100.00000

This code will create a Pareto chart with a title of “Pareto Chart of Product Sales”, x-axis label of “Product”, y-axis label of “Count”, bar colors in a heatmap palette, and a cumulative percentage line width of 2.

Conclusion

The qcc package provides a convenient way to create Pareto charts in R. Pareto charts can be used to identify the most common problems or causes of a problem, so that resources can be focused on addressing them.

Encouragement

I encourage readers to try creating their own Pareto charts in R. You can use the examples in this blog post as a starting point. You can also find more examples and documentation for the qcc package on the CRAN website.

Here are some ideas for Pareto charts that you could create:

  • Pareto chart of the most common customer complaints
  • Pareto chart of the most common causes of manufacturing defects
  • Pareto chart of the most common reasons for website bounce rates
  • Pareto chart of the most time-consuming tasks in your workflow

Once you have created a Pareto chart, you can use the insights that you gain from it to improve your processes or products.

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)