# Locating parts of a string with `stringr`

**R on Jorge Cimentada**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I was wondering the realms of StackOver Flow answering some questions when I encoutered a question that looked to extract some parts of a string based on a regex. I thought I knew how to do this with the package `stringr`

using, for example, `str_sub`

but I found it a bit difficult to map how `str_locate`

complements `str_sub`

.

`str_locate`

and `str_locate_all`

give back the locations of your regex inside the desired string as a `matrix`

or a `list`

respectively. However, that didn’t look very intuitive to pass on to `str_sub`

which (I thought) only accepted numeric vectors with the indices of the parts of the strings that you want to extract. However, to my surprise, `str_sub`

accepts not only numeric vectors but also a matrix with two columns, precisely the result of `str_locate`

.

Let’s create a set of random strings from which we want to extract the word `special*word`

, where `*`

represents a random number.

```
library(stringr)
test_string <-
replicate(
100,
paste0(
sample(c(letters, LETTERS, paste0("special", sample(1:10, 1),"word")), 15),
collapse = "")
)
head(test_string)
```

```
## [1] "pZTQHcDVObnaCFS" "qBxfbIHjauyEmgspecial10word"
## [3] "TKgbmQAEFoJHOVh" "VoBdUAuzfPrmCGX"
## [5] "dGgJOspecial5wordiFpbvXzUD" "WOfLjNospecial4wordEeGkyTA"
```

Using `str_locate`

returns a matrix with the positions of all matches for **every string**. This is what’s called **vectorised** functions in R.

```
location_matrix <-
str_locate(test_string, pattern = "special[0-9]word")
head(location_matrix)
```

```
## start end
## [1,] NA NA
## [2,] NA NA
## [3,] NA NA
## [4,] NA NA
## [5,] 6 17
## [6,] 8 19
```

For this example this wouldn’t work, but I was also interested in checking how the result of `str_locate_all`

would fit in this workflow. `str_locate_all`

is the same as `str_locate`

but since it can find **more** than one match per string, it returns a list with the same slots as there are strings in `test_string`

with a matrix per slot showing the indices of the matches. Since many of the strings in `test_string`

might not have `special*word`

, we need to fill out those matches with `NA`

:

```
location_list <-
str_locate_all(test_string, pattern = "special[0-9]word") %>%
lapply(function(.x) if (all(is.na(.x))) matrix(c(NA, NA), ncol = 2) else .x) %>%
{do.call(rbind, .)}
head(location_list)
```

```
## start end
## [1,] NA NA
## [2,] NA NA
## [3,] NA NA
## [4,] NA NA
## [5,] 6 17
## [6,] 8 19
```

Now that we have everything ready, `str_sub`

can give our desires results using both numeric vectors as well as the entire matrix:

```
# Using numeric vectors from str_locate
str_sub(test_string, location_matrix[, 1], location_matrix[, 2])
```

```
## [1] NA NA NA NA "special5word"
## [6] "special4word" NA NA "special5word" NA
## [11] NA NA NA NA NA
## [16] NA NA NA NA NA
## [21] NA NA NA "special5word" "special6word"
## [26] NA NA NA NA NA
## [31] "special4word" NA NA NA NA
## [36] NA NA NA "special7word" NA
## [41] NA NA NA NA NA
## [46] NA NA NA NA NA
## [51] NA NA NA NA NA
## [56] NA NA NA NA NA
## [61] NA NA "special4word" NA NA
## [66] NA NA NA NA NA
## [71] NA NA NA "special7word" "special9word"
## [76] NA NA NA NA NA
## [81] "special4word" NA NA "special5word" NA
## [86] NA NA NA "special9word" "special9word"
## [91] NA NA NA NA NA
## [96] "special6word" NA NA "special3word" "special1word"
```

```
# Using numeric vectors from str_locate_all
str_sub(test_string, location_list[, 1], location_list[, 2])
```

```
## [1] NA NA NA NA "special5word"
## [6] "special4word" NA NA "special5word" NA
## [11] NA NA NA NA NA
## [16] NA NA NA NA NA
## [21] NA NA NA "special5word" "special6word"
## [26] NA NA NA NA NA
## [31] "special4word" NA NA NA NA
## [36] NA NA NA "special7word" NA
## [41] NA NA NA NA NA
## [46] NA NA NA NA NA
## [51] NA NA NA NA NA
## [56] NA NA NA NA NA
## [61] NA NA "special4word" NA NA
## [66] NA NA NA NA NA
## [71] NA NA NA "special7word" "special9word"
## [76] NA NA NA NA NA
## [81] "special4word" NA NA "special5word" NA
## [86] NA NA NA "special9word" "special9word"
## [91] NA NA NA NA NA
## [96] "special6word" NA NA "special3word" "special1word"
```

```
# Using the entire matrix
str_sub(test_string, location_matrix)
```

```
## [1] NA NA NA NA "special5word"
## [6] "special4word" NA NA "special5word" NA
## [11] NA NA NA NA NA
## [16] NA NA NA NA NA
## [21] NA NA NA "special5word" "special6word"
## [26] NA NA NA NA NA
## [31] "special4word" NA NA NA NA
## [36] NA NA NA "special7word" NA
## [41] NA NA NA NA NA
## [46] NA NA NA NA NA
## [51] NA NA NA NA NA
## [56] NA NA NA NA NA
## [61] NA NA "special4word" NA NA
## [66] NA NA NA NA NA
## [71] NA NA NA "special7word" "special9word"
## [76] NA NA NA NA NA
## [81] "special4word" NA NA "special5word" NA
## [86] NA NA NA "special9word" "special9word"
## [91] NA NA NA NA NA
## [96] "special6word" NA NA "special3word" "special1word"
```

A much easier approach to doing the above (which is cumbersome and verbose) is to use `str_extract`

:

`str_extract(test_string, "special[0-9]word")`

```
## [1] NA NA NA NA "special5word"
## [6] "special4word" NA NA "special5word" NA
## [11] NA NA NA NA NA
## [16] NA NA NA NA NA
## [21] NA NA NA "special5word" "special6word"
## [26] NA NA NA NA NA
## [31] "special4word" NA NA NA NA
## [36] NA NA NA "special7word" NA
## [41] NA NA NA NA NA
## [46] NA NA NA NA NA
## [51] NA NA NA NA NA
## [56] NA NA NA NA NA
## [61] NA NA "special4word" NA NA
## [66] NA NA NA NA NA
## [71] NA NA NA "special7word" "special9word"
## [76] NA NA NA NA NA
## [81] "special4word" NA NA "special5word" NA
## [86] NA NA NA "special9word" "special9word"
## [91] NA NA NA NA NA
## [96] "special6word" NA NA "special3word" "special1word"
```

However, the whole objecive behind this exercise was to clearly map out how to connect `str_locate`

to `str_sub`

and it’s much clearer if you can pass the entire matrix. However, converting `str_locate_all`

is still a bit tricky.

**leave a comment**for the author, please follow the link and comment on their blog:

**R on Jorge Cimentada**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.