Data Science With R Course Series – Week 3

[This article was first published on - Articles, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Data Science and Machine Learning in business begins with R. Why? R is the premier language that enables rapid exploration, modeling, and communication in a way that no other programming language can match: SPEED! This is why you need to learn R. Time is money, and, in a world where you are measured on productivity and skill, R is your machine-learning powered productivity booster.

In this Data Science With R Course Series, we’ll cover what life is like in our ground-breaking, enterprise-grade course called Data Science For Business With R (DS4B 201-R). The objective is to experience the qualities that make R great for business by following a real-world data science project. We review the course that will take you to advanced in 10 weeks.

In this article, we’ll cover Week 3: Data Understanding, which is where we expand our exploratory techniques with the goal of exposing key characteristics of the features in our data set.

But, first, a quick recap of our trajectory and the course overview.

Data Science With R Course Series

You’re in the Week 3: Data Understanding. Here’s our game-plan over the 10 articles in this series. We’ll cover how to apply data science for business with R following our systematic process.

  • Week 1: Getting Started
  • Week 2: Business Understanding
  • Week 3: Data Understanding (You’re Here)
  • Week 4: Data Preparation
  • Week 5: Predictive Modeling With H2O
  • Week 6: H2O Model Performance
  • Week 7: Machine Learning Interpretability With LIME
  • Week 8: Link Data Science To Business With Expected Value
  • Week 9: Expected Value Optimization And Sensitivity Analysis
  • Week 10: Build A Recommendation Algorithm To Improve Decision Making

Week 3: Data Understanding

Week 3: Data Understanding

In data understanding, you’ll learn two key packages that can help identify characteristics of your data:

  1. `skimr`: For efficiently exploring data by data type (e.g. numeric, character, etc)
  2. `GGally`: For visualizing pair plots for many features within the data

Let’s take a peek at the course.

EDA with skimr

We kick week 3 off with skimr, a package for quickly skimming data by data type. In the course you’ll review both numeric data and character data. This is important to identify quickly what issues may be present such as missing values, numeric data that should be categorical, and so on.

Here’s a snapshot of our first use of the skim() function.

EDA with GGally

Next, we build our knowledge of the data by making use of the GGally package for visually identifying relationships in the data. We focus on identifying relationships between the target (employee attrition) and various features in the data set. We make use of the ggpairs() function that enables us to visualize the complex relationships.

Next, you’ll build a custom plotting function with Tidy Eval (learned in Week 2) to extend the functionality of ggpairs() and enable honing in on the relationship between each feature and its interaction with attrition.

We end Week 3 with the second Challenge.

Challenge #2

In Challenge 2, you’ll use your custom plotting function plot_ggpairs() to investigate many complex relationships. You’ll combine features into logical groups based on business knowledge and then visualize the grouped features together to explore their complex relationships to attrition.

Next Up

The next article in the Data Science With R Series covers Data Preprocessing. We’ll learn about an awesome package called recipes that enables preprocessing workflows. We’ll focus on two aspects of data preparation:

  1. Preparing data for people
  2. Preparing data for machines

Week 4: Data Preprocessing

New Course Coming Soon: Build A Shiny Web App!

You’re experiencing the magic of creating a high performance employee turnover risk prediction algorithm in DS4B 201-R. Why not put it to good use in an Interactive Web Dashboard?

In our new course, Build A Shiny Web App (DS4B 301-R), you’ll learn how to integrate the H2O model, LIME results, and recommendation algorithm building in the 201 course into an ML-Powered R + Shiny Web App!

Shiny Apps Course Coming in October 2018!!! Sign up for Business Science University Now!

DS4B 301-R Shiny Application: Employee Prediction

Building an R + Shiny Web App, DS4B 301-R

Get Started Today!

To leave a comment for the author, please follow the link and comment on their blog: - Articles. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)