Descriptive Analytics-Part 4 : Data Manipulation

November 21, 2016
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

downloadDescriptive Analytics is the examination of data or content, usually manually performed, to answer the question “What happened?”.

In order to be able to solve this set of exercises you should have solved the part 0, part 1, part 2 ,and part 3 of this series but also you should run this script which contain some more data cleaning. In case you haven’t, run this script in your machine which contains the lines of code we used to modify our data set. This is the fifth set of exercise of a series of exercises that aims to provide a descriptive analytics solution to the ‘2008’ data set from here. This data set which contains the arrival and departure information for all domestic flights in the US from 2008 has become the “iris” data set for Big Data. Descriptive analytics is all about answering questions, the goal of this set of exercises is to ‘answer’ questions with very few lines of code using the dplyr package. The dplyr is a great package for data manipulation ( if you are familiar with sql , it will be a piece of cake for you). Before proceeding, it might be helpful to look over the help pages for the select, contains, filter,summarise, mutate, group_by, arrange.

For this set of exercises you will need to install and load the package rapportools, outliers.

install.packages('dplyr')
library(dplyr)
install.packages('chron')
library(chron)

Since we use the dplyr package, we will also make the our data frame a local data frame.
flights <- tbl_df(flights)
The reason we do that is because it has some cool properties that can be useful. First of all, if we type ( accidentally) ‘flights’ as a local data frame it will print only the first 10 rows , while as a data frame it will print as many as your screen can fit, which can be both disturbing or have RAM issues may occur down the road. Another reason is that when we type the name of the data frame , it provides us with some information regarding the number of rows and columns and the type of variable that each column is.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
Print the destination, the delay of arrivals and the air time of each flight.
Hint: use select function

Exercise 2
Print the columns that their name contains the word ‘Delay’.

Exercise 3
Print the names of carrier, the month and the day of the week that the delay of carrier is higher than 180.

Exercise 4
Print out all the flights grouped by carrier names.

Exercise 5
Print out the mean of the arrival delay using the summarise function.

Exercise 6
Print out the minimum,mean,median,variance,standard deviation,maximum,and counts of AirTime.

Exercise 7
Print out the mean delay and the number of flights of each carrier.

Exercise 8
Print out the mean delay and the number of flights of each carrier in descending order.

Exercise 9
This exercise is a bit out of context, but it will demonstrate a great way of manipulating data and it is a prerequisite for the next exercise.

Create a new column code>Full_Date which will contain the date of each flight and then print it out.
Hint: Use the mutate function.

Exercise 10
Print out the dates that had the most flights and then print out the dates that had the highest ratio of cancelled flights.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)