[This article was first published on DataCamp Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We just launched Joining Data in R with dplyr taught by Garrett Grolemund, the author of Hands-On Programming with R and R for Data Science from O’Reilly Media. This course builds on what you learned in Data Manipulation in R with dplyr by showing you how to combine data sets with dplyr’s two table verbs. In the real world, data comes split across many data sets, but dplyr’s core functions are designed to work with single tables of data. In this course, you’ll learn the best ways to combine data sets into single tables. You’ll learn how to augment columns from one data set with columns from another with mutating joins, how to filter one data set against another with filtering joins, and how to sift through data sets with set operations. Along the way, you’ll discover the best practices for building data sets and troubleshooting joins with dplyr. Afterward, you’ll be well on your way to data manipulation mastery!
Joining Data in R with dplyr features 84 interactive exercises that combine high-quality video, in-browser coding, and gamification for an engaging learning experience that will help you become a data manipulation master!
What you’ll learn
The first chapter of this course covers mutating joins and explains the various ways you can join datasets together and what happens when you do [Start First Chapter For Free]. Next, you will learn all about filtering joins and set operations. Filtering joins and set operations combine information from datasets without adding new variables. Filtering joins filter the observations of one dataset based on whether or not they occur in a second dataset. Set operations use combinations of observations from both datasets to create a new dataset. The third chapter will show you how to build datasets from basic elements: vectors, lists, and individual datasets that do not require a join. dplyr contains a set of functions for assembling data that work more intuitively than base R’s functions. The chapter will also look at when dplyr does and does not use data type coercion.
Once you’ve mastered the basics, the fourth chapter dives deeper into the mechanics of joins. This chapter will show you how to spot common join problems, how to join based on multiple or mismatched keys, how to join multiple tables, and how to recreate dplyr’s joins with SQL and base R. The fifth and final chapter concludes the course with a case study that applies what you’ve learned to a real world application.