Sales Dashboard in R with qplot and ggplot2 – Part 2

May 8, 2014
By

(This article was first published on Milano R net, and kindly contributed to R-bloggers)

In Part 1 of this series we moved the first steps into building our Sales Dashboard in R. In this Part 2 we explore additional ways to display sales related data.

If you haven't read Part 1, it is highly recommended that you do so first because we will build on what was covered there.

1. Bar charts

A useful way to visualize the total order intake by sales person is to produce a bar chart with the total order amount for each sales person. While this is also an easy task for qplot, we have to be careful about which additional parameters are needed to obtain exactly what we want. Here is the right syntax.

qplot(x=Salesperson, y=Order.Amount, geom="bar", stat="identity", data=data)

Note how qplot has automatically calculated the total order amount per sales person. This is the default behavior for geom="bar" and stat="identity". What actually happens is that qplot generates one bar per order, with an height proportional to the order amount, and then groups and stacks all bars belonging to the same sales person. You can "see" the stacking by coloring the bar outlines in a different color.

qplot(x=Salesperson, y=Order.Amount, color=I("blue"), geom="bar", stat="identity", data=data)

Note that to specify a fixed color for the outline we used the color parameter with an argument of I("blue"). The function I() tells qplot to use the value as is, without any conversion or attempt to interpret it as a variable name.

Now that we understand better how qplot thinks, we can improve the way our data are visualized. For example, we could color code each bar according to the country where the order was taken. As  there is a 1:1 relationship between each sales person and its country, the effect is to color each bar uniformly either for USA or for UK.

qplot(x=Salesperson, y=Order.Amount, fill=Country, geom="bar", stat="identity", data=data)

The parameter useful to color the interior of each bar according to the Country is fill.

Note that the color coding by country has had the effect to add a legend on the left, which in turn has reduced the area available for the chart, causing an annoying overlapping effect for the names of the sales person. In order to fix this, we need to use a more advanced feature that goes beyond qplot, but bear with me because it is not that complex. We are going to rotate the labels for the x axis by 90 degrees and align them properly under the tick marks.

qplot(x=Salesperson, y=Order.Amount, fill=Country, geom="bar", stat="identity", data=data) + theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0))

The call to the theme function has the effect to override the default appearance for the specified element, in this case axis.text.xangle=90 rotates the text clockwise by 90 degrees while hjust=1 and vjust=0 align it properly under the tick marks. You can experiment with different values to see the effect. For example, vjust=0.5 centers the text under the axis tick mark.

In the case of this data set, color coding by Country doesn't add a lot of meaning to the visualization. It would be much more useful for example to color code each portion of the bar according to the calendar year in which the order was taken.

2. Stacked bars

Earlier we found out that each bar in our bar chart is actually made of a series of stacked bars where each one has an height proportional to the order amount. Let's try to color code them by year instead that by country and see what it looks like. Experimenting (and, yes!, making mistakes) is often the best way to learn how qplot works!

qplot(x=Salesperson, y=Order.Amount, fill=Order.Date, geom="bar", stat="identity", data=data) + theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0))

This is indeed a fancy looking chart! I am sure any Sales Director would be absolutely pleased with it (ok, just kidding!)

What's the problem here? Well, we have told qplot to color code each stacked bar with a fill color corresponding to the Order.Date. Since almost all order dates are different one from the other, qplot has used a large range of discrete colors to try to code them all. The result is the Arlecchino chart above.

What we actually need to do is to have one different color for each year, which means we need to extract the year from each order date and pass it to fill in qplot. Having earlier converted Order.Date to a Date class allows us to use as.character to extract the year and convert it to a character format.

qplot(x=Salesperson, y=Order.Amount, fill=as.character(Order.Date, "%Y"), geom="bar", stat="identity", data=data) + theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0))

While the result is graphically as expected, there are a couple of annoyances in this chart. First, the title of the legend includes the function used to extract the year and convert it to a character sequence. Second, the sequence of the colors in the legend is exactly the opposite of the sequence of colors in the bars.

To fix the first problem we have different possibilities. One would be to add a Year variable to the data set, containing the order year already in the needed format. However this would represent an unnecessary duplication of information. The second way is to assign a different title to the legend. This is straightforward to do through the labs function.

qplot(x=Salesperson, y=Order.Amount, fill=as.character(Order.Date, "%Y"), geom="bar", stat="identity", data=data) + theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0)) + labs(fill="Year")

The call to labs is telling qplot that whatever variable is used to encode the fill attribute (or "aesthetic" in ggplot2 jargon) should be labeled as specified by the fill argument.

To fix the order of colors in the legend, so that their sequence correspond to the one in the bars, we can use the guides function.

qplot(x=Salesperson, y=Order.Amount, fill=as.character(Order.Date, "%Y"), geom="bar", stat="identity", data=data) + theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0)) + labs(fill="Year") + guides(fill=guide_legend(reverse=TRUE))

guides has a similar logic to labs. It takes the name of the attribute (aesthetic) that we want to modify in the legend and uses guide_legend to set an attribute for it. A way to read it is: in the guides (aka the legend), reverse the sequence of colors for the fill attribute.

It works exactly as expected, but our plotting command is getting long! Is there any chance to simplify it? As it turns out guide_legend supports another attributes among the many which is title, meant to set the title of the legend (or guide). In this case it is equivalent to what we achieved with labs, so we can omit the latter and move the definition of the lagend title within guide_legend.

qplot(x=Salesperson, y=Order.Amount, fill=as.character(Order.Date, "%Y"), geom="bar", stat="identity", data=data) + theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0)) + guides(fill=guide_legend(title="Year", reverse=TRUE))

This produces the exact same chart as above.

3. A final touch

We have generated with little code a professionally looking sales chart. I guess your Sales Director will be very much pleased with it. For the perfectionists out there, we could add a final touch to it though.

First, the labels for the axis are still the name of the variables in the data set. We could do better for sure. Second, we are missing a title for the chart. qplot can accommodate our needs through three additional parameters that can be specified directly into its call.

• xlab sets the label for the x axis
• ylab (you guessed it!) sets the label for the y axis
• main sets the label for the chart title

qplot(x=Salesperson, y=Order.Amount, fill=as.character(Order.Date, "%Y"), geom="bar", stat="identity", data=data, xlab="Sales Person", ylab="Order Amount (USD)", main="Orders by Sales Person and Year") + theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0)) + guides(fill=guide_legend(title="Year", reverse=TRUE))

This is it for Part 2. In Part 3 we will cover some more variations to the bar charts and other type of data visualization. Till next time!

* This article originally appeared in Sales Dashboard in R with qplot and ggplot2 - Part 2