Pie Charts in ggplot2

July 29, 2010
By

(This article was first published on R-Chart, and kindly contributed to R-bloggers)

...and other isomorphic data shape presentations...

The Pie Chart has been widely criticized in recent times by statisticians.  Edward Tufte goes as far as to call this the "prevailing orthodoxy."  The reasons generally cited:
  • The relative size of each slice is difficult to interpret.  Studies have shown that piecharts are hard to read.
  • Pie charts require too much space to present too little information.
  • They are frequently are rendered in 3d (which makes the previous two issues worse).
  •  There are better visualization alternatives.  For example, bar or point charts can display the same data.
The second chapter of Leland Wilkinson's "The Grammar of Graphics" is "How to make a Pie."  He is less critical of pie charts and in fact uses one as an example of how his "grammar of graphics" provides a framework that defines the steps and process required to construct one.  It is an interesting exercise that shows how the decisions required to construct a pie chart can described in terms of an underlying grammar that also can describe the construction of a wide variety of other visualizations.

His pie chart example is based upon survey that asked the question "How often, if at all, do you think the peer review refereeing system for scholarly journals in your field is biased in favor of the following categories of people?"

Hadley Wickham's R package ggplot2 was created based upon Wilkinson's writings.  It also incorporates design principles championed by Edward Tufte.   Pie charts are created by transforming a stacked bar chart using polar coordinates.  Polar coordinates are also used to create some other circular charts (like bullseye charts).  The final chart creating using ggplot2 appears above.  In the ggplot2 book the following components are listed that make up a plot:
  • Data
  • Aesthetic Mappings
  • Geometric Objects
  • Statistical Transformations
  • Position Adjustment
  • Faceting
  • Coordinate System
Each of these categories will be cited below along with its ggplot2 expression.



    On page 38 Wilkinson shows the final results in two pie charts.  A table that replicates the responses that he presented is as follows.

    Summary Response Gender
    0.08 1 1
    0.11 2 1
    0.17 3 1
    0.32 4 1
    0.32 5 1
    0.3 1 2
    0.15 2 2
    0.1 3 2
    0.07 4 2
    0.38 5 2

    Data
    The following is the R code required to accomplish this.  I also used the sqldf package do replace numeric values with corresponding string values. (Not the most R-ish way of approaching the problem)


    library(ggplot2)
    library(sqldf)
    df = read.csv('data.csv')


    df=sqldf("select Summary,
      CASE WHEN Gender==1 THEN 'Female'
           WHEN Gender==2 THEN 'Male'
      END gender,
      CASE WHEN Response==1 THEN '1) rarely' 
           WHEN Response==2 THEN '2) infrequently' 
           WHEN Response==3 THEN '3) occasionally' 
           WHEN Response==4 THEN '4) frequently' 
           WHEN Response==5 THEN '5) not sure' 
      END response from df")


    Aesthetics Mappings
    Now that we have a data frame with the data in the desired format, our initial intent is to create a stacked bar chart.  The chart will have a single column, so the x coordinate will be set to 1.  The y coordinate represents the amount reported in the "Summary" column.  The color (or fill) is based upon the response column.


    p = ggplot(data=df, 
           aes(x=factor(1),
               y=Summary,
               fill = factor(response)
          ),


    Geometric Objects
    The geometric object (or geom) in this case will be used to create a bar chart. In ggplot2 this is identified as geom_bar.

    p=p + geom_bar(width = 1) 

    Statistical Transformations
    A statistical transformation (or stat) is used to transform or summarize the data.  We are not using one in this example.

    Position Adjustment
    A position adjustment is used to modify the position of displayed elements in some way.  For example, you can modify a stacked bar chart so that each column is the same height (and sections of a given bar are therefore proportional).  Again, these do not come into play in the current example.

    Faceting
    Faceting is used to represent each value for a given variable in its own chart.  In the current example, there will be two charts created - one for male and the other for female.

    p=p+facet_grid(facets=. ~ gender)

    At this point, we can stop and display the chart as it stands.

    p



    So it is evident that there is a close relationship between stacked bar charts and pie charts indicated through the use of the grammar.  This is not obvious in systems where a pie chart is created by rendering a circle and modifying the image.

    p = p + coord_polar(theta="y") 
    p

    The final few lines clean up the x and y labels and modify the title for the legend.

    p = p + xlab('') +
    ylab('') +
    labs(fill='Response')  

    And this completes the chart  displayed at the top of this article.  You can also create a chart for each gender individually by limiting the data displayed in the frame and removing the faceting:

    ggplot(data=df[df$gender=='Male',], 
             aes(x=factor(1),
             y=Summary,
             fill = factor(response))) + 
     geom_bar(width = 1) + 
     coord_polar(theta="y") +
     xlab('Males') +
     ylab('') +
     labs(fill='Response')

    ggplot(data=df[df$gender=='Female',], 
             aes(x=factor(1),
             y=Summary,
             fill = factor(response))) + 
     geom_bar(width = 1) + 
     coord_polar(theta="y") +
     xlab('Females') +
     ylab('') +
     labs(fill='Response')

    Certain visualizations are just "right" and clearly convey information about underlying data.   Compared to other packages, ggplot2 makes it easy to good looking charts.  The grammar of graphics really does provide a way of abstracting the components common to many types of visualizations, which makes it easier to understand which visualizations are - for lack of a better term - isomorphic (in that they present the same data shape).



    To leave a comment for the author, please follow the link and comment on his blog: R-Chart.

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



    If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

    Tags: ,

    Comments are closed.