Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. To select areas of interest in a data frame they often need to be ordered by specific columns. The dplyr `arrange()` function supports data frame orderings by multiple columns in ascending and descending order.

• Use the `arrange()` function to sort data frames.
• Sort data frames by multiple columns using `arrange()`.
```arrange(, )
arrange(, , , ...)```

## The arrange() function with a single column

```arrange(, )
arrange(, , , ...)```

The `arrange()` function orders the rows of a data frame. It takes a data frame or a tibble as the first parameter and the names of the columns based on which the rows should be ordered as additional parameters. Let’s assume, we want to answer the question: Which states had the highest percentage of Republican voters in the 2016 US presidential election? To answer this question, in the following example we use the `pres_results_2016` data frame, containing information only for the 2016 US presidential election. We `arrange()` the data frame based on the `rep` column (Republican votes in percentage):

```arrange(pres_results_2016, rep)
# A tibble: 51 x 6
year state total_votes   dem    rep  other
<dbl> <chr>       <dbl> <dbl>  <dbl>  <dbl>
1  2016 DC         312575 0.905 0.0407 0.0335
2  2016 HI         437664 0.610 0.294  0.0958
3  2016 VT         320467 0.557 0.298  0.0737
# … with 48 more rows```

As you can see in the output, the data frame is sorted in an ascending order based on the `rep` column. However, we would prefer to have the results in a descending order, so that we can instantly see the `state` with the highest `rep` percentage. To sort a column in a descending order, all we need to do is apply the `desc()` function on the given column inside the `arrange()` function:

```arrange(pres_results_2016, desc(rep))
# A tibble: 51 x 6
year state total_votes   dem   rep  other
<dbl> <chr>       <dbl> <dbl> <dbl>  <dbl>
1  2016 WV         713051 0.265 0.686 0.0489
2  2016 WY         258788 0.216 0.674 0.0830
3  2016 OK        1452992 0.289 0.653 0.0575
# … with 48 more rows```

Arranging is not only possible on numeric values, but on character values as well. In that case, dplyr sorts the rows in alphabetic order. We can arrange character columns just like numeric ones:

```arrange(pres_results_2016, state)
# A tibble: 51 x 6
year state total_votes   dem   rep  other
<dbl> <chr>       <dbl> <dbl> <dbl>  <dbl>
1  2016 AK         318608 0.366 0.513 0.0928
2  2016 AL        2123372 0.344 0.621 0.0254
3  2016 AR        1130635 0.337 0.606 0.0577
# … with 48 more rows```

## Exercise: Use arrange() based on a single column

The `gapminder_2007` dataset contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect which country had the lowest life expectancy `lifeExp` in 2007! The dplyr package is already loaded.

1. Apply the `arrange()` function on the `gapminder_2007` tibble
2. Order the tibble based on the `lifeExp` column
Start Exercise

## Exercise: Use arrange() in combination with desc()

The `gapminder_2007` dataset contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect which countries had the largest population in 2007! The dplyr package is already loaded.

1. Apply the `arrange()` function on the `gapminder_2007` tibble.
2. Sort the tibble in a descending order based on the `pop` column.
Start Exercise

## The arrange() function with multiple columns

We can use the `arrange()` function on multiple columns as well. In this case the order of the columns in the function parameters, sets a hierarchy of ordering. The function starts by ordering the rows based on the first column defined in the parameters. In case there are several rows with the same value, the function decides the order based on the second column defined in the parameters. If there are still multiple rows with the same values, the function decides based on the third column defined in the parameters (if defined) and so on.

In the following example we use the `pres_results_subset` data frame, containing election results only for the states: `"TX"`(Texas),`"UT"`(Utah) and `"FL"`(Florida). First we sort the data frame in a descending order based on the `year` column. Then, we add a second level, and order the data frame based on the `dem` column:

```arrange(pres_results_subset, year, dem)
# A tibble: 33 x 6
year state total_votes   dem   rep   other
<dbl> <chr>       <dbl> <dbl> <dbl>   <dbl>
1  1976 UT         541218 0.336 0.624 0.0392
2  1976 TX        4071884 0.511 0.480 0.00817
3  1976 FL        3150631 0.519 0.466 0.0143
# … with 30 more rows```

As you can see in the output, the data frame is overall ordered based on the `year` column. However, when the value of `year` is the same, the order of the rows is decided by the `dem` column.

## Exercise: Use arrange() based on multiple columns

The `gapminder_2007` tibble contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect for each continent, which countries had the highest life expectancy in 2007! The dplyr package is already loaded.

1. Apply the `arrange()` function on the `gapminder_2007` tibble.
2. Order the tibble based on the `continent` column!
3. In case there are rows with the same `continent`, sort the tibble in a descending order based on the `lifeExp` column!
Start Exercise

## Quiz: arrange() Function

Which of the following statements are true about the `arrange()` function?
• The `arrange()` function orders the rows of a data frame.
• To `arrange()` the values of column in an ascending order, we need to use the `asc()` function.
• To `arrange()` the values of column in a descending order, we need to use the `desc()` function.
• You can only `arrange()` a data frame based on one column.
Start Quiz

Sort data frames by columns is an excerpt from the course Introduction to R, which is available for free at quantargo.com

VIEW FULL COURSE