21+ Online Courses to Get Started Today with Data Cleaning

May 26, 2016
By

(This article was first published on R - Blendo Blog, and kindly contributed to R-bloggers)

21+ Online Courses to Get Started Today with Data Cleaning

Yeah… working with data sets means that you have a way to get them first. After you get them you have to clean them.

Data scientists spend 80% of their time in data cleaning and data manipulation and only 20% of their time actually analyzing it.

And then you find yourself spending 80% of your time to clean these data. At the same time, deadlines and management demands keep you up at night.

This is one reason data analysts and data scientists regularly scour the web looking for anything that could help. Tools, tutorials, resources.

I have stumbled many posts around related with general Data Science MOOC courses or tutorials. But never one that has a list of resources on one of the most time-consuming processes in the data pipeline. Data cleaning.

In this post, I did my best to gather everything there is online. If you find a resource that I missed please let me know in the comments below.

Let’s start with the basics…

What is data cleaning?

Data cleaning, data cleansing or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.
Source: Wikipedia1

Note: Some of the courses bellow belong to specializations or batches of courses. For example, Coursera has a Data Science specialization or Udacity’s Nanodegree Program but you may also take each course individually. If you are interested in a certificate, then usually there is a fee. If not (for Coursera at least) you may “audit” the course. Other courses are free and others are subscription based services.

Data Cleaning in R

Getting and Cleaning Data (Coursera)

  • Course Name: Getting and Cleaning Data
  • Institution: Johns Hopkins University
  • Coursera Specialization: Data Science Specialization
  • Price: Free
  • Belongs to Coursera’s Data Science Specialization from Johns Hopkins University and it is one of the best Data Cleaning courses out here.The course covers the basics needed for collecting, cleaning, and sharing data.

Data Science and Machine Learning Essentials (edX)

  • Course Name: Data Science and Machine Learning Essentials
  • Institution: Microsoft
  • Price: Free, paid for certificate
  • Another one of the best Data Science courses MOOC course. It covers tools like R, Python and SQL and among others covers data acquisition, ingestion, sampling, quantization, cleaning, and transformation.

Data Science with R (O’Reilly)

  • Course Name: Data Science with R
  • Price: Paid
  • It is part in one of O’Reilly’s Learning Paths. It starts from the basics to more advanced techniques including R Graph and machine learning. It contains an intro to Data Science with R, how to manipulate data sets and expert Data Wrangling with R.

Cleaning Data in R (DataCamp)

Foundations of Data Science (Springboard)

  • Course Name: Foundations of Data Science
  • Price: Free (some chapters), Subscription based or one-time payment
  • It has a unit about Data Wrangling and data cleaning with R.

Udemy Courses

You may want to take a look at the list of resources about Data cleaning and R inside Udemy. There are a lot to choose from, but it might require some searching to find which one is valuable to you.

 

Data Cleaning in Python

Data Science and Machine Learning Essentials

See the Data Science and Machine Learning Essentials (edX) course above.

Intro to Data Analysis – Data Analysis Using NumPy and Pandas (Udacity)

Data Wrangling with MongoDB – Data Manipulation and Retrieval (Udacity)

  • Course Name: Data Wrangling with MongoDB – Data Manipulation and Retrieval
  • Udacity Nanodegree Program: Data Analyst Nanodegree
  • Institution: Udacity + MongoDB
  • Price: Free.
  • It belongs to Udacity’s Data Analyst Nanodegree. It povides information on how to gather and extract data in widely used data formats. How to assess the quality of data and explore best practices for data cleaning. It also covers the essentials of storing data, the MongoDB query language and how to perform exploratory analysis using the MongoDB aggregation framework.

Python for Data Analysis (Big Data University)

Intermediate Python and Pandas (DataQuest)

  • Course Name: Intermediate Python and Pandas
  • Price: Free (some chapters), Subscription based.
  • It helps you acquire more advanced Python and Pandas skills that among other will help you to improve your data munging and data cleaning skills.

Data Analysis and Visualization (DataQuest)

  • Course Name: Data Analysis and Visualization
  • Price: Free (some chapters), Subscription based.
  • Play with NumPy, Pandas and Jupyter while learning how to clean your data.

Data Science Intensive (Springboard)

  • Course Name: Data Science Intensive
  • Price: Free (some chapters), Subscription based or one-time payment
  • It has a unit about Data Wrangling and data cleaning with Python.

Big Data Science with BD2K-LINCS (Coursera)

  • Course Name: BD2K-LINCS Data Coordination and Integration Center
  • Institution: BD2K-LINCS Data Coordination and Integration Center
  • Coursera Specialization: None
  • Price: Free
  • This is a life science related statistics course but it provides info on how to collect data, basic data processing and data normalization methods that can be used for data cleaning. Basic courses in statistics and molecular biology are useful but not required. Ability to write short scripts in languages such as Python would be useful.

Exploring CO2 Emissions Data using Pandas data frames in Python (Big Data University)

Python Applications (DataQuest)

  • Course Name: Python Applications
  • Price: Free (some chapters), Subscription based.
  • Learn how to use Python to visualize, explore and clean data using real datasets.

Python for Business Analysts (DataQuest)

  • Course Name: Python for Business Analysts
  • Price: Free (some chapters), Subscription based.
  • Use Python to clean, visualize, and explore datasets.

Udemy Courses

You may want to take a look at the list of resources about Data cleaning and Python inside Udemy. There are a lot to choose from, but it might require some searching to find which one is valuable to you.

 



Data Cleaning (SQL, Spark etc.)

Introduction to Big Data Analytics (Coursera)

  • Course Name: Introduction to Big Data Analytics
  • Institution: University of California, San Diego
  • Coursera Specialization: Big Data Specialization
  • Price: Free, paid for certificate
  • This is a (really) quick intro on Big Data query interfaces, environments, and tools like HBASE, HIVE, Pig or Spark. There are some parts that focus on data exploration and data cleaning with Spark.

Working With Large Datasets (DataQuest)

  • Course Name: Working With Large Datasets
  • Price: Free (some chapters), Subscription based.
  • Work with Map-Reduce and Spark to clean, process and analyze large datasets.

Data Cleaning (OpenRefine, Tableau, Excel or other tools)

Introduction to OpenRefine (Big Data University)

  • Course Name: Introduction to OpenRefine
  • Price: Free.
  • It covers the basics of OpenRefine and its scripting language GREL and provides info on data cleaning.

How to clean your data (European Data Portal)

  • Course Name: How to clean your data
  • Price: Free.
  • It covers the topic of cleaning up data, explores common errors found in open datasets and how they affect the way we work with this data. You can find more training material at the European Data portal here.

Data, Analytics and Learning (edX)

  • Course Name: Data, Analytics and Learning
  • Institution: University of Texas Arlington + Tableau Software
  • Price: Free
  • The course provides a great overview of the field, suitable for a broad audience. Explore the logic of analytics, the basics of finding, cleaning, using educational data to build predictive models and perform text analysis

Data Analysis for your Business (edX)

  • Course Name: Data Analysis for your Business
  • Institution: DelftX
  • Specialization: XSeries
  • Price: Paid for certificate
  • Use Excel for importing data, data cleaning, data wrangling, interpreting and visualizing, with special emphasis on real-time dashboards.

Videos

When I was searching for this courses I stumbled upon some great videos from presentation in conferences. I added them here in case anybody is interested.

Closure

I hope this list will help anyone who is looking to clean her data or is looking for a smooth start with the subject of data wrangling. If you know any course that I missed or any of the above is not fitting for the list please let me know in the comments or Twitter bellow.


– or if you liked it until now you are more than welcome to share 🙂

To leave a comment for the author, please follow the link and comment on their blog: R - Blendo Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)