Data wrangling : Transforming (1/3)

July 5, 2017
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)


Data wrangling is a task of great importance in data analysis. Data wrangling, is the process of importing, cleaning and transforming raw data into actionable information for analysis. It is a time-consuming process which is estimated to take about 60-80% of analyst’s time. In this series we will go through this process. It will be a brief series with goal to craft the reader’s skills on the data wrangling task. This is the third part of the series and it aims to cover the transforming of data used.This can include filtering, summarizing, and ordering your data by different means. This also includes combining various data sets, creating new variables, and many other manipulation tasks. At this post, we will go through the most basic tasks including slicing, and filtering on the famous mtcars data set.

Before proceeding, it might be helpful to look over the help pages for the select, rename, sample_frac, slice, distinct, filter, rownames, %in%.

Moreover please load the following libraries.
install.packages("dplyr")
library(dplyr)

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1

Print out the hp column using the select function.

Exercise 2

Print out the all but hp column using the select function.

Exercise 3

Print out the mpg, hp, vs, am, gear columns. Consider using the colon (:) symbol.

Exercise 4

Create the object cars_m_h containing the columns mpg, hp columns but let the column names be ‘miles_per_gallon’, and ‘horse_power’ respectively.

Exercise 5

Change the column names of cars_m_h from ‘miles_per_gallon’, and ‘horse_power’ to ‘mpg’ and ‘hp’ respectively.

Exercise 6

Print out a randomly half the observations of cars_m_h.
Hint : consider using the sample_frac function

Exercise 7

Create a cars_m_h_s object, containing from 10th to 35th row of cars_m_h.
Hint : Consider using the slice function.

Exercise 8

Print out the cars_m_h_s object without any duplicates.
Hint : Consider using the distinct function.

Exercise 9

Print out from cars_m_h_s object all the observations which have mpg>20 and hp>100.

Exercise 10

Select the ‘Lotus Europa’ car.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)