New Skill Track: Tidyverse Fundamentals with R

Posted on September 19, 2018 by Chester Ismay in R bloggers | 0 Comments

[This article was first published on DataCamp Community - r programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here is the track link.

Track Details

In this track, you’ll learn the skills needed to get you up and running with data science in R using the tidyverse. The tidyverse is a collection of R packages that share a common design philosophy and are designed to seamlessly work together, all with novices to data science professionals equally benefiting. You’ll begin by hopping into data wrangling and data visualization with the gapminder dataset in the Introduction to the Tidyverse course. Next, in Working with Data in the Tidyverse, you’ll learn about “tidy data” and see how to get data into tidy format with fun datasets from the British reality television series “Great British Bake Off” allowing you to see what’s cooking behind the scenes of so many R data analyses. After that, you’ll ease into general modeling concepts via regression using tidyverse principles in Modeling with Data in the Tidyverse. There you’ll explore Seattle housing prices and how different variables can be used to explore patterns in these prices.

Throughout these first three courses, you’ll use the dplyr, ggplot2, and tidyr packages that serve as the powerhouses of the tidyverse allowing you to see the power of readable code. In Communicating with Data in the Tidyverse, you’ll learn how to further customize your ggplot2 graphics and use R Markdown to write reproducible reports while working with data from the International Labour Organization in Europe. The track closes with Categorical Data in the Tidyverse that explores ways to handle the sometimes tricky concept of factors in data science with R using datasets from Kaggle’s Data Science and Machine Learning Survey and FiveThirtyEight.com.

The goal of the track is for you to gain experience using the tools and techniques of the whole data science pipeline made famous by Hadley Wickham and Garrett Grolemund as shown below. You’ll gain exposure to each component of this pipeline from a variety of different perspectives in this track. We look forward to seeing you in the track!

Introduction to the Tidyverse

This is an introduction to the programming language R, focused on a powerful set of tools known as the “tidyverse”. In the course, you’ll learn the intertwined processes of data manipulation and visualization through the tools dplyr and ggplot2. You’ll learn to manipulate data by filtering, sorting and summarizing a real dataset of historical country data in order to answer exploratory questions. You’ll then learn to turn this processed data into informative line plots, bar plots, histograms, and more with the ggplot2 package. This gives a taste both of the value of exploratory data analysis and the power of tidyverse tools. This is a suitable introduction for people who have no previous experience in R and are interested in learning to perform data analysis.

This course was even approved by Hadley himself:

Working with Data in the Tidyverse

In this course, you’ll learn to work with data using tools from the tidyverse in R. By data, we mean your own data, other people’s data, messy data, big data, small data – any data with rows and columns that comes your way! By work, we mean doing most of the things that sound hard to do with R, and that need to happen before you can analyze or visualize your data. But work doesn’t mean that it is not fun – you will see why so many people love working in the tidyverse as you learn how to explore, tame, tidy, and transform your data. Throughout this course, you’ll work with data from a popular television baking competition called “The Great British Bake Off.”

Modeling with Data in the Tidyverse

In this course, you will learn to model with data. Models attempt to capture the relationship between an outcome variable of interest and a series of explanatory/predictor variables. Such models can be used for both explanatory purposes, e.g. “Does knowing professors’ ages help explain their teaching evaluation scores?”, and predictive purposes, e.g., “How well can we predict a house’s price based on its size and condition?” You will leverage your tidyverse skills to construct and interpret such models. This course centers around the use of linear regression, one of the most commonly-used and easy to understand approaches to modeling. Such modeling and thinking is used in a wide variety of fields, including statistics, causal inference, machine learning, and artificial intelligence.

Communicating with Data in the Tidyverse

They say that a picture is worth a thousand words. Indeed, successfully promoting your data analysis is not only a matter of accurate and effective graphics, but also of aesthetics and uniqueness. This course teaches you how to leverage the power of ggplot2 themes for producing publication-quality graphics that stick out from the mass of boilerplate plots out there. It shows you how to tweak and get the most out of ggplot2 in order to produce unconventional plots that draw attention on social media. In the end, you will combine that knowledge to produce a slick and custom-styled report with RMarkdown and CSS – all of that within the powerful tidyverse.

Categorical Data in the Tidyverse

As a data scientist, you will often find yourself working with non-numerical data, such as job titles, survey responses, or demographic information. This type of data is qualitative and can be ordinal if they have an order to them, or categorical/nominal, if they don’t. R has a special way of representing them, called factors, and this course will help you master working with them using the tidyverse package forcats. We’ll also work with other tidyverse packages, including ggplot2, dplyr, stringr, and tidyr and use real-world datasets, such as the FiveThirtyEight flight dataset and Kaggle’s State of Data Science and ML Survey. Following this course, you’ll be able to identify and manipulate factor variables, quickly and efficiently visualize your data, and effectively communicate your results. Get ready to categorize!

To leave a comment for the author, please follow the link and comment on their blog: DataCamp Community - r programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers