# Descriptive Analytics-Part 5: Data Visualisation (Categorical variables)

December 11, 2016
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Descriptive Analytics is the examination of data or content, usually manually performed, to answer the question “What happened?”.

In order to be able to solve this set of exercises you should have solved the part 0, part 1, part 2,part 3, and part 4 of this series but also you should run this script which contain some more data cleaning. In case you haven’t, run this script in your machine which contains the lines of code we used to modify our data set. This is the sixth set of exercise of a series of exercises that aims to provide a descriptive analytics solution to the ‘2008’ data set from here. This data set which contains the arrival and departure information for all domestic flights in the US from 2008 has become the “iris” data set for Big Data. The goal of Descriptive analytics is to inform the user about what is going on at the dataset. A great way to do that fast and effectively is by performing data visualisation. Data visualisation is also a form of art, it has to be simple, comprehended and full of information. On this set of exercises we will explore different ways of visualising categorical variables using the famous `ggplot2` package. Before proceeding, it might be helpful to look over the help pages for the ` ggplot`, `geom_bar`, ` facet_wrap`,`facet_grid`, ` coord_polar`, ` geom_raster`, ` scale_fill_distiller`.

For this set of exercises you will need to install and load the packages ` ggplot2`, code>dplyr, and `RColorBrewer`.

`install.packages('ggplot2')`
`library(ggplot2)`
`install.packages('dplyr')`
`library(dplyr)`
`install.packages('RColorBrewer')`
`library(RColorBrewer)`

I have also changed the values of the `DaysOfWeek` variable, if you wish to do that as well the code for that is :
`install.packages('lubridate')`
`library(lubridate)`
`flights\$DayOfWeek <- wday(as.Date(flights\$Full1_Date,'%m/%d/%Y'), label=TRUE)`

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
Construct a barplot which illustrates the number of flights per carrier.

Exercise 2
Make a barplot which illustrates the number of flights per carrier and each bar also contains information regarding the number of cancellations per carrier.

Exercise 3
Make a barplot which illustrates the number of flights per carrier but also for every carrier to have two bars that show the number of flights that were cancelled and the ones that departed.

Exercise 4
Make a barplot that shows the proportion of cancelled flights per carrier.

Exercise 5
Make seven barplots which illustrate the number of flights per carrier and each bar also contains information regarding the number of cancellations per carrier for every day of the week.

Exercise 6
Make one barplot which illustrates the number of flights per carrier and each bar also contains information regarding the number of cancellations per carrier for every day of the week.

Exercise 7
Create a pie chart that illustrates the number of flights per carrier

Exercise 8
Create a wind rose that illustrates the number of flights per carrier for every day of the week.

Exercise 9
Make a heat map that illustrates the number of flights per carrier for every day of the week.

Exercise 10
With the same data from the heatmap from the previous exercise, also provide some information regarding the cancellation ratio (2 digits recommended) and make customise the heatmap in order for the higher values to be more distinctive.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.