Descriptive Analytics-Part 1: Data Formatting Exercises

October 26, 2016
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

downloadDescriptive Analytics is the examination of data or content, usually manually performed, to answer the question “What happened?”.

In order to be able to solve this set of exercises you should have solved the ‘part 0’ of this series, in case you haven’t you can find the solutions to run them in your machine here. This is the second set of exercise of a series of exercises that aims to provide a descriptive analytics solution to the ‘2008’ data set from here. This data set which contains the arrival and departure information for all domestic flights in the US from 2008 has become the “iris” data set for Big Data. In the exercises below we will try to make the format of the dates adequate for further processing. Before proceeding, it might be helpful to look over the help pages for the str_pad, substring, paste, chron, head.

For this set of exercises you will need to install and load the packages stringr, chron.

install.packages('stringr')
install.packages('chron')
library(stringr)
library(chron)

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
Print the first five rows of the dataset. What do you think about the date formatting?

Exercise 2
Create a new objected named dep_time and assign the values of flights$DepTime . If the value is less than 4 elements, fill make it a 4-element value with zeros. For example, 123 -> 0123.

Exercise 3
Create a new object named hour and assign the first two elements of the dep_time object. Can you figure out why I am asking that?

Exercise 4
Create a new object named minutes and assign the last two elements of the dep_time object.

Exercise 5
Assign to the object dep_time the hour in format ‘HH:MM:SS’ , seconds should be ‘00’ , we make this assumption for the sake of formatting.

Exercise 6
Change the class of dep_time from character to times.

Exercise 7
Print the first 10 rows and then the 10 last rows of the dep_time. If the formatting of the object is ‘HH:MM:SS’(as it should) then assign the dep_time to flights$DepTime .

Exercise 8
Do the exact same process for the flights$DepTime and create the variable arr_time

Exercise 9
Do the exact same process for the flights$ CRSDepTime and create the variable crs_dep_time

Exercise 10
Do the exact same process for the flights$CRSArrTime and create the variable crs_arr_time.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)