Summarising data using bar charts

December 12, 2009
By

(This article was first published on Software for Exploratory Data Analysis and Statistical Modelling, and kindly contributed to R-bloggers)

A bar graph is a frequently used type of display that compares counts, frequencies, totals or other summary measures for a series of categories, e.g. sales in different market sectors or in quarters in a financial year. The bar graph can be laid out with the categories either on the vertical or horizontal axis of the display – depending on whether we consider making a vertical or horizontal comparison is easier for interpreting the graph.

In R there are multiple ways for creating graphs, including the base graphics, lattice graphics and the ggplot2 grammar of graphics approach. To illustrate how we can create a bar chart using these packages we will make use of some data taken from the FAO statistics website for the UK in 2007. The data is for production (in metric tonnes) of the top five, in terms of production, food and agricultural commodities.

The first step before creating the graphs is to prepare the data in a format that can be used by the graphing functions. As this dataset is small we can manually create the data object. To make the labels on the graph less cluttered the production is recorded as 1,000s of metric tonnes.

The R code to create the data object is shown here:

uk2007 = data.frame(Commodity =
  factor(c("Cow milk", "Wheat", "Sugar beet", "Potatoes", "Barley"),
    levels = c("Cow milk", "Wheat", "Sugar beet", "Potatoes", "Barley")),
  Production = c(14023, 13221, 6500, 5635, 5079))

The levels argument is explicity defined to make sure that the ordering is as required from largest to smallest production rather than being alphabetical which would be how the categories are ordered otherwise.

Base Graphics

The base graphics in R provide a function barplot that we can use to create a bar chart. The first argument to the function is the name of the object with the data. The names argument is used to provide the labels for the categories in the graph. We also specify the text for the labels for the x-axis, y-axis and title of the graph with the xlab, ylab and main arguments respectively.

The function call is:

barplot(uk2007$Production, names = uk2007$Commodity,
  xlab = "Commodity", ylab = "Production (1,000 MT)",
  main = "UK 2007 Top 5 Food and Agricultural Commodities")

to produce the following graph:

Base Graphics Bar Chart

Base Graphics Bar Chart

This graph is visually appealing with sensible space between the bars for the five commodity categories.

Lattice Graphics

In the lattice graphics package the barchart function is used to create bar charts. The x and y variables are specified using a formula, which is the standard way when using Trellis graphics. The variable on the vertical axis is specified on the left hand side of the formula and the variable for the horizontal axis is on the right hand side, where they are separated by the tilda character.

barchart(Production $sim$ Commodity, data = uk2007, xlab = "Commodity",
  ylab = "Production (1,000 MT)",
  main = "UK 2007 Top 5 Food and Agricultural Commodities")

This code produces the following graph:

Lattice Graphics Bar Chart

Lattice Graphics Bar Chart

The main visual difference compared to the base graphics example is the default colours for the bars which is much brighter than the base graphics example. There is also a large gap between the bars in the display.

ggplot2

The create the bar chart in the ggplot2 package we use the ggplot function to specify the data to appear in the graph then gradually add in the other components of the graph.

We specify the data frame where the data is stored and then use the aes argument to identify the x and y variables. The geom_bar function is used to create a bar chart display with the specified data and the last three options in the example are for creating the various labels to be added to the graph.

The graph itself is constructed piece by piece to add the various layers and components on top of the base layer:

ggplot(uk2007, aes(Commodity, Production)) + geom_bar() + xlab("Commodity") +
  ylab("Production (1,000 MT)") +
  opts(title = "UK 2007 Top 5 Food and Agricultural Commodities")

This code produces the following graph:

gg plot2 Bar Chart

gg plot2 Bar Chart

The layout of this graph differs mainly with the grid background layout, which by default is a gray with white lines.

To leave a comment for the author, please follow the link and comment on his blog: Software for Exploratory Data Analysis and Statistical Modelling.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , , , , , , ,

Comments are closed.