Let’s be real: data cleaning is not the reason most people sign up to be data scientists. It’s probably the least sexy thing that holders of this century’s sexiest job have to do on a regular basis.
But it’s also one of the most important. For example, when Kaggle conducted a survey of data workers in 2017, roughly half of them said that dirty data was a major barrier they faced at work. And data scientists regularly say that they spend more time on data cleaning than any other task.
Why is data cleaning so important? Because of that fundamental principle of computing: garbage in, garbage out.
That’s why we’re excited to launch yet another new data cleaning course! Data Cleaning in R: Advanced is the latest entry to our fast-growing Data Analyst in R path. It’s designed to help you add to the the skills you learned in our first R data cleaning course as you apply them to dirtier data sets and tougher challenges.
Ready to rise to the challenge and add some powerful new data cleaning skills to your skill set?
(We offer a similar course for Python coders if you’re looking to build your data cleaning skills in Python, by the way).
What You’ll Learn About R Data Cleaning
In Data Cleaning in R: Advanced, you’ll get hands-on with messy, real-world data sets including Hacker News headlines, Hacker News posts, and NYC traffic data, as you learn and apply new skills to prepare data for analysis.
You’ll start by learning how to use regular expressions in R using the
stringr package. Regular expressions (often called regex) provide you with powerful ways to manipulate strings and quickly clean text data. You’ll also learn to clean and work with data in the JSON format using
jsonite. This is an important skill because you’re likely to encounter JSON data often when you’re working with data from web APIs.
purrr package, you’ll learn to apply map functions to streamline your work with list-like data. And you’ll also learn about Anonymous functions and how they’re useful in R data cleaning workflows.
Then, you’ll dig into the course’s final mission, which focuses on dealing with missing data. You’ll learn efficient ways of visualizing missing data, and examine a variety of statistically-sound ways you might be able to handle that missing data without having to drop rows from your data set.
By the end of the course, you’ll be as comfortable working with JSON data as you are importing and working with CSVs. You’ll be able to manipulate strings quickly and efficiently using regular expressions, and you’ll have improved your numerical data cleaning abilities by implementing Map functions and Anonymous functions for increased efficiency.
Finally, you’ll be able to quickly find and visualize the gaps in your data set, and you’ll have mastered a variety of techniques for replacing that missing data, either with external data you’re merging into your data set, or with statistical techniques.
Why Study At Dataquest
Like all of our courses, Data Cleaning in R: Advanced uses our interactive, browser-based learning system, so that you can apply what you’re learning as you’re learning it.
Studies have repeatedly confirmed that students who apply what they’re learning perform better and retain more than students who don’t, and our interactive platform allows you to write code using not just R but also all of the popular packages you’ll be using in real-world data science workflows.
That makes it really easy for you to make the jump from online learning to real-world projects.
And speaking of “real-world,” all of our courses use real-world data sets. When you’re watching video, it’s easy to feel like you’re learning without actually retaining all that much, but in our courses you’ll be challenged to immediately apply what you’ve learned to solve real data science problems.
Of course, you don’t have to take our word for it! You can check out how students answered our student survey, or read what some of our students have to say about their Dataquest experiences. Or you can check out our reviews on third-party sites like Switchup, G2 Crowd, and Course Report.
Learning data cleaning is a critical part of any data analyst or data scientist’s toolkit. If you’ve mastered the basics of R data cleaning already and want to take the next jump, this new advanced course will have you cleaning data like a seasoned professional. Start learning today!
Charlie is a student of data science, and also a content marketer at Dataquest. In his free time, he’s learning to mountain bike and making videos about it.