JHU-Coursera Data Science Specialization and MOOCs Interest

[This article was first published on Reimagined Invention, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


I have completed this specialization more than a year ago but I have decided to write again about it because I’m teaching Data Visualization at Universidad de Chile and JHU materials has proven to be an excellent source for my students. If you want to read more about this specialization you can do that here.

Why I do recommend this specialization?

During my major I had to take four statistics courses that consisted on probability, inference and econometrics. During those courses, among other things, I did learn that the OLS estimator in the context of a linear model turns out to be a BLUE estimator, but I did never learn how to report statistical information and that is a problem.

What I did learn in this specialization? To ask questions to making inference, publishing results, and about something really important named Reproducible Research. This specialization has a focus on Reproducible Research and communicating results. Most courses have both quizzes and projects. I had the chance to share projects made under a totally different approach to mine and I did learn a lot from that.

I did like it as I had no knowledge about R, and I needed to use R to complete my thesis about Structural Equation Modeling because my advisor Edgar E. Kausel is cool and he wanted to make it reproducible. The courses are well structured and focused on practical applications rather than on statistical theory. At first, it was hard as I had to read a lot and write a lot code that is not needed in programs such as SPSS or Stata, but that code is a fundamental piece for open science.

MOOCs and people’s interest

I am lucky enough to have students from quite different backgrounds such as Engineering and Obstetrics. Why shall a student of Obstetrics or English Literature have an interest for Data Visualization? When I uploaded the syllabus I stated that the first weeks of the course were going to be about Google Sheets and R to learn to process data first.

Talking to my students I could realize that some of them wanted to create elegant plots, or understanding Public Health statistics or using R to analyze texts like this article by Julia Silge. It was surprisingly good to see they found my course to be useful and some of them have heard about R but didn’t take a MOOC in the past because some of them are in english.

It became common that when I am presented as a person who knows statistics I’m often asked about Big Data and my opinion about how that is going to change our lives. Some journalist and economist friends call me or send me emails to ask me about my book and which MOOC do I recommend to learn R because they face restrictions with cells limits in spreadsheet software or they cannot use the propietary software they use at work when thay are on their laptops.

The good and the bad about JHU Data Science Specialization

I recommend this specialization given that, in my opinion, its favourable points overpass the negative points, and the negative points are more experience-related than contents-related.

Good points

  • Self-contained courses
  • Good course materials (texts and videos)
  • You can study at your own pace and learn from other’s projects

Bad points

  • Assignments are partially based on peer reviewing
  • Some reviewers give bad qualifications without providing details
  • Good feedback should be promoted and enhanced
  • The final Capstone Project is really demanding and you’ll need to push yourself to be creative and study things that were not covered in the courses

Courses description

In case you want a detailed description, here’s the content of each course.

Course 1 • The Data Scientist’s Toolbox


This course teaches you how to set up a Github account and sync files. No other quizzes or assignments than those related to configure and use Github.

Course 2 • R Programming


  • Week 1: Overview of R, R data types and objects, reading and writing data.
  • Week 2: Control structures, functions, scoping rules, dates and times.
  • Week 3: Loop functions, debugging tools.
  • Week 4: Simulation, code profiling.

Course 3 • Getting and Cleaning Data


  • Obtain data from a variety of sources.
  • Apply the basic tools for data cleaning and manipulation.

Course 4 • Exploratory Data Analysis


  • Visual representations of data using the base, lattice, and ggplot2 plotting systems in R.
  • Exploratory summaries of data.
  • Create visualizations of multidimensional data using exploratory multivariate statistical techniques.

Course 5 • Reproducible Research


  • Use of R markdown.
  • Integrate R code into a literate statistical program.
  • Organize a data analysis so that it is reproducible and accessible to others.

Course 6 • Statistical Inference


  • Fundamentals of statistical inference.
  • Assumptions and modes of performing statistical inference.

Course 7 • Regression Models


  • How to fit regression models.
  • How to interpret coefficients.
  • How to investigate residuals and variability.
  • Special cases of regression models including use of dummy variables and multivariable adjustment.
  • Extensions to generalized linear models, especially considering Poisson and logistic regression.

Course 8 • Practical Machine Learning


  • Components of a machine learning algorithm.
  • Apply multiple basic machine learning tools.
  • Apply machine learning tools to build and evaluate predictors on real data.

Course 9 • Developing Data Products


  • How communicate using statistics and statistical products.
  • Emphasis to communicating uncertainty in statistical results.
  • How to create simple Shiny web applications and R packages .

Here’s my Course project.

Course 10 • Data Science Capstone


This course consisted on the solely purpose of writing a Shiny application that works for text prediction. This project required me to study a lot and use all of the things that I learned during the specialization.

When I took this specialization SwiftKey was paying attention to what the students were doing with things such as Empirical Bayes Method –that was used by Turing himself to decript messages– to create an efficient application given Shiny limits. Also, the best students that had a blog had the possibility to be accepted as R-Bloggers writers.

Here’s my Course Project.

To leave a comment for the author, please follow the link and comment on their blog: Reimagined Invention.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)