stringfix : new R package for string manipulation in a %>% way

[This article was first published on Guillaume Pressiat, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I usually write around here in french and mainly report on French Hospitals data managment and the statistical tasks they imply. As today’s post is about a new package I have created, I’ll be writing in english. The package is called stringfix because it uses infix operators to manipulate character strings.

This post is an actualisation on December 2018 post.

Introduction

In Python, the operator + is used to paste two character strings together. For example: 'Hello ' + 'world' gives 'Hello World'. For that matter, building sentences with words and arithmetic symbols seems a very nice way to write. In R, the paste function requires parenthesis in order to be computed. Therefore the use of consecutive functions can make it hard to understand.

+ is a nice operator, and we can use it in R almost as it is used in Python by creating an infix operator.

<span class="n">`%+%`</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">){</span><span class="w">
  </span><span class="n">paste0</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>

While a ggplot function has already the same name, it is used to override data in a ggplot call and not for pasting character strings, see here. When loading tidyverse, the same ggplot function is called, preventing us from using paste0’s %+%. Otherwise, you can find a hint of character string pasting in the Advanced R book.

In order to create a toolbox around paste0’s %+%, I started collecting some other infix functions for character strings manipulation. The main question was: which functions with a right to left call that I use really often could be reordered in a %>% code. Here is the little family I have since build on : paste, grepl, substring, count, padding. The goal of this package is to use stringr or base functions in backend as a start for an alternative character string manipulation in R.

This package is still at its early begining (kind of a draft for me!) but I thought some other people would enjoy it and may even wish to contribute.

Presentation

<span class="s2">"In a manner of coding, I just want to say..."</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="s2">"Nothing."</span><span class="w">
</span><span class="c1">#>[1] "In a manner of coding, I just want to say... Nothing."</span><span class="w">
</span>

Examples

paste

<span class="s1">'Hello '</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'world'</span><span class="w">
</span><span class="c1">#> [1] "Hello world"</span><span class="w">
</span><span class="s1">'Your pastas taste like '</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'%>%'</span><span class="w">
</span><span class="c1">#> [1] "Your pastas taste like %>%"</span><span class="w">
</span><span class="s1">'coco'</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'bolo'</span><span class="w">
</span><span class="c1">#> [1] "cocobolo"</span><span class="w">
</span>
<span class="s1">'Hello'</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="s1">'world'</span><span class="w">
</span><span class="c1">#> [1] "Hello world"</span><span class="w">
</span><span class="s1">'Your pastas taste like'</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="s1">'%>%'</span><span class="w">
</span><span class="c1">#> [1] "Your pastas taste like %>%"</span><span class="w">
</span><span class="s1">'Hello'</span><span class="w"> </span><span class="o">%,%</span><span class="w"> </span><span class="s1">'world...'</span><span class="w">
</span><span class="c1">#> [1] "Hello, world..."</span><span class="w">
</span><span class="s1">'Your pastas taste like '</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'%>%...'</span><span class="w"> </span><span class="o">%,%</span><span class="w"> </span><span class="s1">'or %>>%...'</span><span class="w">
</span><span class="c1">#> [1] "Your pastas taste like %>%..., or %>>%..."</span><span class="w">
</span>

grepl

Case sensitive
<span class="s1">'pig'</span><span class="w"> </span><span class="o">%g%</span><span class="w"> </span><span class="s1">'The pig is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="s1">'Pig'</span><span class="w"> </span><span class="o">%g%</span><span class="w"> </span><span class="s1">'The pig is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] FALSE</span><span class="w">
</span>
Case insensitive (ignore.case)
<span class="s1">'pig'</span><span class="w"> </span><span class="o">%gic%</span><span class="w"> </span><span class="s1">'The pig is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="s1">'PIG'</span><span class="w"> </span><span class="o">%gic%</span><span class="w"> </span><span class="s1">'The PiG is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span>

substring

<span class="s1">'NFKA008'</span><span class="w"> </span><span class="o">%s%</span><span class="w"> </span><span class="s1">'1.4'</span><span class="w">
</span><span class="c1">#> [1] "NFKA"</span><span class="w">
</span><span class="s1">'NFKA008'</span><span class="w"> </span><span class="o">%s%</span><span class="w"> </span><span class="m">.4</span><span class="w">
</span><span class="c1">#> [1] "NFKA"</span><span class="w">
</span><span class="s1">'where is'</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="p">(</span><span class="s1">'the pig is in the cornfield'</span><span class="w"> </span><span class="o">%s%</span><span class="w"> </span><span class="s1">'1.7'</span><span class="p">)</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'?'</span><span class="w">
</span><span class="c1">#> [1] "where is the pig?"</span><span class="w">
</span>

count

<span class="n">fruit</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"apple"</span><span class="p">,</span><span class="w"> </span><span class="s2">"banana"</span><span class="p">,</span><span class="w"> </span><span class="s2">"pear"</span><span class="p">,</span><span class="w"> </span><span class="s2">"pineapple"</span><span class="p">)</span><span class="w">
</span><span class="s2">"a"</span><span class="w"> </span><span class="o">%count%</span><span class="w"> </span><span class="n">fruit</span><span class="w">
</span><span class="c1">#> [1] 1 3 1 1</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span><span class="w"> </span><span class="s2">"b"</span><span class="p">,</span><span class="w"> </span><span class="s2">"p"</span><span class="p">,</span><span class="w"> </span><span class="s2">"p"</span><span class="p">)</span><span class="w"> </span><span class="o">%count%</span><span class="w"> </span><span class="n">fruit</span><span class="w">
</span><span class="c1">#> [1] 1 1 1 3</span><span class="w">
</span>

pad, lpad and rpad

<span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="s1">'0.5'</span><span class="w">
</span><span class="c1">#> [1] "00005"</span><span class="w">
</span><span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w">   </span><span class="m">.5</span><span class="w">
</span><span class="c1">#> [1] "00005"</span><span class="w">
</span><span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w">  </span><span class="s1">'.5'</span><span class="w">
</span><span class="c1">#> [1] "    5"</span><span class="w">
</span><span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="s1">'2.5'</span><span class="w">
</span><span class="c1">#> [1] "22225"</span><span class="w">
</span><span class="s1">'é'</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="s1">'é.5'</span><span class="w">
</span><span class="c1">#> [1] "ééééé"</span><span class="w">
</span>

names of tibbles : tolower and toupper

I have added two functions that I use really often.

<span class="n">library</span><span class="p">(</span><span class="n">magrittr</span><span class="p">)</span><span class="w">
</span><span class="n">iris</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">toupper_names</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">head</span><span class="w">
</span><span class="c1">#>   SEPAL.LENGTH SEPAL.WIDTH PETAL.LENGTH PETAL.WIDTH SPECIES</span><span class="w">
</span><span class="c1">#> 1          5.1         3.5          1.4         0.2  setosa</span><span class="w">
</span><span class="c1">#> 2          4.9         3.0          1.4         0.2  setosa</span><span class="w">
</span><span class="c1">#> 3          4.7         3.2          1.3         0.2  setosa</span><span class="w">
</span><span class="c1">#> 4          4.6         3.1          1.5         0.2  setosa</span><span class="w">
</span><span class="c1">#> 5          5.0         3.6          1.4         0.2  setosa</span><span class="w">
</span><span class="c1">#> 6          5.4         3.9          1.7         0.4  setosa</span><span class="w">
</span>

Finally, I also wanted to outline that the function from the rmngb package : %out% : negation of %in% can be very useful to avoid typing ! x %in% y (you can just type x %out% y instead). This is why I have included it in this package !

More functions and information here : https://github.com/GuillaumePressiat/stringfix


To leave a comment for the author, please follow the link and comment on their blog: Guillaume Pressiat.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)