# Introduction to dplyr

**Quantargo Blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

- Learn what
**dplyr**does - Get an overview of Select, Filter and Sort
- Learn what Joins, Aggregations and Pipelines are

## What is dplyr

There’s the joke that 80 percent of data science is cleaning the data and 20 percent is complaining about cleaning the data.Anthony Goldbloom, Founder and CEO of Kaggle

Having *clean* data in any Data Science project is super important, because the results only get as good as is the data correct. Cleaning data is also the part which usually consumes most of the time and causes the biggest pains for data scientists. R already offers a broad set of tools and functions to manipulate data frames. However, due to its long history, the available base R toolset is fragmented and hard to use for new users.

The **dplyr** package facilitates the data tranformation process through a consistent collection of functions. These functions support different transformations on data frames, including

- filter rows
- select columns
- sort data
- aggregate data

Multiple data frames can also be joined together by common attribute values.

The consistency of **dplyr** functions improves usability and enables user to connect transformations together to form *data pipelines*. These pipelines can also be seen as a high-level query language—much like e.g. the SQL language for database queries. Additionally, it is even possible to translate created data pipelines to other backends including databases.

## Quiz: dplyr Facts

Which of the below statements are correct?

**dplyr**provides a consistent set of functions for data visualization**dplyr**functions can be connected to data pipelines**dplyr**queries can be translated to database queries**dplyr**supports data transformations like aggregations and joins**dplyr**is built for vector transformations

## Function Framework

Every data transformation function in **dplyr** accepts a data frame as its first input parameter and returns the transformed data frame back as an output. A blueprint for a typical **dplyr** function looks like this:

transformed

The `dplyr_function`

can be customized further through additional arguments (`param_one`

, `param_two`

) placed after the first data frame parameter (`my_data_frame`

).

The real power of **dplyr** comes with the pipe operator `%>%`

which allows users to concatenate **dplyr** functions to data pipelines. The pipe injects the resulting data frame from the previous calculation as the first argument of next one. A data transformation consisting of three functions looks like

dplyr_function_three( dplyr_function_two( dplyr_function_one(my_data_frame)))

but can be written with the pipe as

my_data_frame %>% dplyr_function_one() %>% dplyr_function_two() %>% dplyr_function_three()

The different reading order of data transformation functions in actual transformation order makes pipelines easier to read than nested function calls.

## Quiz: Valid Functions

`dplyr_function`

specifies the transformation function, `param_one`

the parameter for the **dplyr** function and `input_data_frame`

the data frame to be transformed. Which of the code lines below are valid according to the **dplyr** function framework?

`dplyr_function(param_one, input_data_frame)`

`dplyr_function(input_data_frame, param_one)`

`input_data_frame(dplyr_function, param_one)`

`param_one(dplyr_function, input_data_frame)`

Introduction to dplyr is an excerpt from the course Introduction to R, which is available for free at quantargo.com

**leave a comment**for the author, please follow the link and comment on their blog:

**Quantargo Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.