more .I in data.table

[This article was first published on Data By John, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Following on from my last post, here is a bit more about the use of .I in data.table.

Scenario : you want to obtain either the first, or last row, from a set of rows that belong to a particular group.

For example, for a patient admitted to hospital, you may want to capture their first admission, or the entire time they were in a specific hospital (hospital stay), or their journey across multiple hospitals and deparments (Continuous Stay). The key point is that these admissions have a means of identifying the patient, and the stay itself, and that there will likely be several rows of data for each.

With data.table’s .I syntax, we can grab the first row using .I[1], and the last row, regardless of how many there are, using .I[.N] See the example function below.

At patient level, I want the first record in the admission, so I can count unique admissions.

.dt[.dt[,.I[1], idcols]$V1][,.SD, .SDcols = vars][]

This retrieves the first row using the identity column, and joins back to the original dataset, returning the ID and any other supplied columns ( which are passed to the ... argument)

If I want to grab the last row, I switch to the super handy .N function:

.dt[.dt[,.I[.N], idcols]$V1][,.SD, .SDcols = vars][]

This retrieves the last row using the specified identity column(s), joins back to the original data and retrieves any other required columns.

Of course, this is lightning quick, rock solid, and reliable.

get_records <- function(.dt,
                        position = c("first", "last"),
                        type = c("patient", "stays" ,"episodes"),
                        ...) {

  if (type == "patient") {
    idcols <- "PatId"
  }


  if (type == "stays") {
    idcols <- c("PatId", "StayID")
  }

  if (type == "episodes") {
    idcols <- c("PatId", "StayID", "GUID")
  }


  vars <-  eval(substitute(alist(...)), envir = parent.frame())
  vars <- sapply(as.list(vars), deparse)
  vars <- c(idcols, vars)

  if (position == "first") {
    res <- .dt[.dt[,.I[1], idcols]$V1][,.SD, .SDcols = vars][]
  }

  if (position == "last") {
    res <- .dt[.dt[,.I[.N], idcols]$V1][,.SD, .SDcols = vars][]
  }

  res
}

data.table has lots of useful functionality hidden away, so hopefully this shines a light on some of it, and encourages some of you to investigate it for yourself. ```

To leave a comment for the author, please follow the link and comment on their blog: Data By John.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)