Data wrangling : Cleansing – Regular expressions (1/3)

August 16, 2017
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)


Data wrangling, is the process of importing, cleaning and transforming raw data into actionable information for analysis. It is a time-consuming process which is estimated to take about 60-80% of analyst’s time. In this series we will go through this process. It will be a brief series with goal to craft the reader’s skills on the data wrangling task. This is the fourth part of the series and it aims to cover the cleaning of data used. At previous parts we learned how to import, reshape and transform data. The rest of the series will be dedicated to the data cleansing process. On this post we will go through the regular expressions, a sequence of characters that define a search pattern, mainly
for use in pattern matching with text strings.In particular, we will cover the foundations of regular expression syntax.

Before proceeding, it might be helpful to look over the help pages for the sub, gsub.

Moreover please run the following commands to create the strings that we will work on.
textmeta <- "R|is|cool,|so|are|you|that|you|are|for|__|your|skills|by|solving|this|exercise. Moreover parenthesis symbol is []! Finally once you are done with this set go for a coffee, you deserve it!"

textseq <-"I hope you are using R version 3.4.0 and you have updated on 2017-04-21, with nickname:'You stupid Darkness'."

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1

From object textmeta substitute the full stop (‘.’) with exclamation mark (‘!’) and assign the result to the object text.

Exercise 2

From object text substitute the double underscore (‘__’) with ‘enchancing’ and assign the result to the object text.

Exercise 3

From object text substitute the backslash (‘\’) with a letter-spacing (‘ ‘) and assign the result to the object text.

Learn more about Text analysis in the online course Text Analytics/Text Mining Using R. In this course you will learn how create, analyse and finally visualize your text based data source. Having all the steps easily outlined will be a great reference source for future work.

Exercise 4

From object text substitute the square brackets (‘[]’) with parentheses (‘()’) and assign the result to the object text.

Exercise 5

From object text substitute the ‘coffee’ with ‘your favourite beverage’ and assign the result to the object text.

Exercise 6

From object textseq substitute all the digits with a sharp (‘#’) and assign the result to the object text.

Exercise 7

From object textseq substitute any space with a underscore (‘_’) and assign the result to the object text.

Exercise 8

From object textseq substitute any wording with blah’ and assign the result to the object text.

Exercise 9

From object textseq seperate each word with a backslash (‘\’) and assign the result to the object text.

Exercise 10

From object textseq seperate each character boundary with a backslash (‘\’) and assign the result to the object text.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)