Code snippets – first(), last() and nth() {dplyr} functions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Inspired by conversations on the NHS-R Slack where code answers are lost over time (it’s not a paid account), and also for those times when a detailed comment in code isn’t appropriate but would be really useful, this blog is part of a series of code snippet explanations.
Where this code snippet comes from
This blog comes from small part of code shared as part of a larger piece of analysis from the Strategy Unit and the British Heart Foundation to visualise socio-economic inequalities in the Coronary Heart Disease (CHD) pathway. The report and analysis was presented at a Midlands Analyst Huddle in January. If you would like to know more about the report and the code I’ll be referring to, it is published on GitHub.
The code is in two parts with the first being data formatting and the second part being the statistics for relative index of inequality (RII).
Thanks to Jacqueline Grout, Senior Healthcare Analyst and Tom Jemmett, Senior Data Scientist of the Strategy Unit.
Creating a column with the first()
or the last()
data in a group
In the analysis example these functions are used to repeat the highest and lowest population count per Indices of Multiple Deprivation (IMD) decile for each GP practice area1.
The best way to show what is happening with the two {dplyr} functions first()
and last()
is to show it with an even simpler dummy data set:
library(dplyr, warn.conflicts = FALSE) # There are two patients A and B with A having a numbers 1:10 and patient B has numbers 11:20 data <- tibble(id = rep(c("PatientA", "PatientB"), 10)) |> group_by(id) |> mutate(number = ifelse(id == "PatientA", 1:10, 11:20)) |> arrange(number) |> mutate( max_number = last(number), min_number = first(number) ) |> ungroup() # persistent grouping is needed in the code but removed at the end for good practice data # A tibble: 20 × 4 id number max_number min_number <chr> <int> <int> <int> 1 PatientA 1 10 1 2 PatientA 2 10 1 3 PatientA 3 10 1 4 PatientA 4 10 1 5 PatientA 5 10 1 6 PatientA 6 10 1 7 PatientA 7 10 1 8 PatientA 8 10 1 9 PatientA 9 10 1 10 PatientA 10 10 1 11 PatientB 11 20 11 12 PatientB 12 20 11 13 PatientB 13 20 11 14 PatientB 14 20 11 15 PatientB 15 20 11 16 PatientB 16 20 11 17 PatientB 17 20 11 18 PatientB 18 20 11 19 PatientB 19 20 11 20 PatientB 20 20 11
Taking the min and the max numbers for a patient is dependent upon a persisent group_by()
and an arrange()
and, if you are familiar with SQL, is similar to the Windows partitions MIN
and MAX
and would be written for this example as MIN(number) OVER(PARTITION BY id ORDER BY number)
. What SQL can’t do so easily though is to select the nth()
number for example:
library(dplyr) # There are two patients A and B with A having a numbers 1:10 and patient B has numbers 11:20 and letters from data <- tibble(id = rep(c("PatientA", "PatientB"), 10)) |> mutate(letters = letters[1:20]) |> # occurs before grouping so that the letters don't get restricted to the 10 rows in a group group_by(id) |> mutate(number = ifelse(id == "PatientA", 1:10, 11:20)) |> arrange(number) |> mutate(sixth_number = nth(letters, 6)) data # A tibble: 20 × 4 # Groups: id [2] id letters number sixth_number <chr> <chr> <int> <chr> 1 PatientA a 1 k 2 PatientA c 2 k 3 PatientA e 3 k 4 PatientA g 4 k 5 PatientA i 5 k 6 PatientA k 6 k 7 PatientA m 7 k 8 PatientA o 8 k 9 PatientA q 9 k 10 PatientA s 10 k 11 PatientB b 11 l 12 PatientB d 12 l 13 PatientB f 13 l 14 PatientB h 14 l 15 PatientB j 15 l 16 PatientB l 16 l 17 PatientB n 17 l 18 PatientB p 18 l 19 PatientB r 19 l 20 PatientB t 20 l
and in this example the 6th character for PatientA is k and for PatientB (row 16) is l.
Use case
Another potential use case for the nth()
character selection is where analysis is looking for the last but one appointment date. For example, if a patient had appointments:
appts <- tibble(id = rep(c("PatientA", "PatientB"), 3)) |> arrange(id) |> mutate( appointments = rep(c("2023-01-01", "2023-02-01", "2023-03-01"), 2), team = rep(c("teamA", "teamB", "teamC"), 2) ) appts # A tibble: 6 × 3 id appointments team <chr> <chr> <chr> 1 PatientA 2023-01-01 teamA 2 PatientA 2023-02-01 teamB 3 PatientA 2023-03-01 teamC 4 PatientB 2023-01-01 teamA 5 PatientB 2023-02-01 teamB 6 PatientB 2023-03-01 teamC
and we needed to know the details from the second to last appointment to see who they had been seen before teamC:
appts |> filter(appointments == nth(appointments, n() - 1), .by = id) # A tibble: 2 × 3 id appointments team <chr> <chr> <chr> 1 PatientA 2023-02-01 teamB 2 PatientB 2023-02-01 teamB
The nth()
function needs to know which column to look at, appointments, and then which number to select. Instead of hard coding this as 2 as that will only be useful in this very small data set, the code uses n()
to count all the rows and then minus 1. The reason why this is an operation applied to both patients is because of the .by = id
which is new feature of {dplyr} v1.1.0. Code using the group_by()
function will do the same thing but is just an extra two lines as it will also require ungroup()
to remove:
appts |> group_by(id) |> filter(appointments == nth(appointments, n() - 1)) |> ungroup() # A tibble: 2 × 3 id appointments team <chr> <chr> <chr> 1 PatientA 2023-02-01 teamB 2 PatientB 2023-02-01 teamB
Getting involved
If you need any help with {dplyr} or would like to share your own use cases feel free to share them in the NHS-R Slack or submit a blog for this series.
NHS-R Community also have a repository for demos and how tos which people are welcome to contribute code to either through pull requests or issues.
Footnotes
- I’ve written more about IMD in a blog for the CDU Data Science Team.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.