stringfix : new R package for string manipulation in a %>% way

[This article was first published on R posts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I usually write around here in french and mainly report on French Hospitals data managment and the statistical tasks they imply. As today’s post is about a new package I have created, I’ll be writing in english. The package is called stringfix because it uses infix operators to manipulate character strings.

Introduction

In Python, the operator + is used to paste two character strings together. For example: 'Hello ' + 'world' gives 'Hello World'. For that matter, building sentences with words and arithmetic symbols seems a very nice way to write. In R, the paste function requires parenthesis in order to be computed. Therefore the use of consecutive functions can make it hard to understand.

+ is a nice operator, and we can use it in R almost as it is used in Python by creating an infix operator.

<span class="n">`%+%`</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">){</span><span class="w">
  </span><span class="n">paste0</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>

While a ggplot function has already the same name, it is used to override data in a ggplot call and not for pasting character strings, see here. When loading tidyverse, the same ggplot function is called, preventing us from using paste0’s %+%. Otherwise, you can find a hint of character string pasting in the Advanced R book.

In order to create a toolbox around paste0’s %+%, I started collecting some other infix functions for character strings manipulation. The main question was: which functions with a right to left call that I use really often could be reordered in a %>% code. Here is the little family I have since build on : paste, grepl, substring, count, padding. The goal of this package is to use stringr or base functions in backend as a start for an alternative character string manipulation in R.

This package is still at its early begining (kind of a draft for me!) but I thought some other people would enjoy it and may even wish to contribute.

Presentation

<span class="s2">"In a manner of coding, I just want to say..."</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="s2">"Nothing."</span><span class="w">
</span><span class="c1">#>[1] "In a manner of coding, I just want to say... Nothing."</span><span class="w">
</span>

Examples

paste

<span class="s1">'Hello '</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'world'</span><span class="w">
</span><span class="c1">#> [1] "Hello world"</span><span class="w">
</span><span class="s1">'Your pastas taste like '</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'%>%'</span><span class="w">
</span><span class="c1">#> [1] "Your pastas taste like %>%"</span><span class="w">
</span><span class="s1">'coco'</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'bolo'</span><span class="w">
</span><span class="c1">#> [1] "cocobolo"</span><span class="w">
</span>
<span class="s1">'Hello'</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="s1">'world'</span><span class="w">
</span><span class="c1">#> [1] "Hello world"</span><span class="w">
</span><span class="s1">'Your pastas taste like'</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="s1">'%>%'</span><span class="w">
</span><span class="c1">#> [1] "Your pastas taste like %>%"</span><span class="w">
</span><span class="s1">'Hello'</span><span class="w"> </span><span class="o">%,%</span><span class="w"> </span><span class="s1">'world...'</span><span class="w">
</span><span class="c1">#> [1] "Hello, world..."</span><span class="w">
</span><span class="s1">'Your pastas taste like '</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'%>%...'</span><span class="w"> </span><span class="o">%,%</span><span class="w"> </span><span class="s1">'or %>>%...'</span><span class="w">
</span><span class="c1">#> [1] "Your pastas taste like %>%..., or %>>%..."</span><span class="w">
</span>

grepl

Case sensitive
<span class="s1">'pig'</span><span class="w"> </span><span class="o">%g%</span><span class="w"> </span><span class="s1">'The pig is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="s1">'Pig'</span><span class="w"> </span><span class="o">%g%</span><span class="w"> </span><span class="s1">'The pig is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] FALSE</span><span class="w">
</span>
Case insensitive (ignore.case)
<span class="s1">'pig'</span><span class="w"> </span><span class="o">%gic%</span><span class="w"> </span><span class="s1">'The pig is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="s1">'PIG'</span><span class="w"> </span><span class="o">%gic%</span><span class="w"> </span><span class="s1">'The PiG is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span>

substring

<span class="s1">'NFKA008'</span><span class="w"> </span><span class="o">%s%</span><span class="w"> </span><span class="s1">'1.4'</span><span class="w">
</span><span class="c1">#> [1] "NFKA"</span><span class="w">
</span><span class="s1">'NFKA008'</span><span class="w"> </span><span class="o">%s%</span><span class="w"> </span><span class="m">.4</span><span class="w">
</span><span class="c1">#> [1] "NFKA"</span><span class="w">
</span><span class="s1">'where is'</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="p">(</span><span class="s1">'the pig is in the cornfield'</span><span class="w"> </span><span class="o">%s%</span><span class="w"> </span><span class="s1">'1.7'</span><span class="p">)</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'?'</span><span class="w">
</span><span class="c1">#> [1] "where is the pig?"</span><span class="w">
</span>

count

<span class="n">fruit</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"apple"</span><span class="p">,</span><span class="w"> </span><span class="s2">"banana"</span><span class="p">,</span><span class="w"> </span><span class="s2">"pear"</span><span class="p">,</span><span class="w"> </span><span class="s2">"pineapple"</span><span class="p">)</span><span class="w">
</span><span class="s2">"a"</span><span class="w"> </span><span class="o">%count%</span><span class="w"> </span><span class="n">fruit</span><span class="w">
</span><span class="c1">#> [1] 1 3 1 1</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span><span class="w"> </span><span class="s2">"b"</span><span class="p">,</span><span class="w"> </span><span class="s2">"p"</span><span class="p">,</span><span class="w"> </span><span class="s2">"p"</span><span class="p">)</span><span class="w"> </span><span class="o">%count%</span><span class="w"> </span><span class="n">fruit</span><span class="w">
</span><span class="c1">#> [1] 1 1 1 3</span><span class="w">
</span>

pad, lpad and rpad

<span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="s1">'0.5'</span><span class="w">
</span><span class="c1">#> [1] "00005"</span><span class="w">
</span><span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w">   </span><span class="m">.5</span><span class="w">
</span><span class="c1">#> [1] "00005"</span><span class="w">
</span><span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w">  </span><span class="s1">'.5'</span><span class="w">
</span><span class="c1">#> [1] "    5"</span><span class="w">
</span><span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="s1">'2.5'</span><span class="w">
</span><span class="c1">#> [1] "22225"</span><span class="w">
</span><span class="s1">'é'</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="s1">'é.5'</span><span class="w">
</span><span class="c1">#> [1] "ééééé"</span><span class="w">
</span>

names of tibbles : tolower and toupper

I have added two functions that I use really often.

<span class="n">library</span><span class="p">(</span><span class="n">magrittr</span><span class="p">)</span><span class="w">
</span><span class="n">iris</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">toupper_names</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">head</span><span class="w">
</span><span class="c1">#>   SEPAL.LENGTH SEPAL.WIDTH PETAL.LENGTH PETAL.WIDTH SPECIES</span><span class="w">
</span><span class="c1">#> 1          5.1         3.5          1.4         0.2  setosa</span><span class="w">
</span><span class="c1">#> 2          4.9         3.0          1.4         0.2  setosa</span><span class="w">
</span><span class="c1">#> 3          4.7         3.2          1.3         0.2  setosa</span><span class="w">
</span><span class="c1">#> 4          4.6         3.1          1.5         0.2  setosa</span><span class="w">
</span><span class="c1">#> 5          5.0         3.6          1.4         0.2  setosa</span><span class="w">
</span><span class="c1">#> 6          5.4         3.9          1.7         0.4  setosa</span><span class="w">
</span>

Finally, I also wanted to outline that the function from the rmngb package : %out% : negation of %in% can be very useful to avoid typing ! x %in% y (you can just type x %out% y instead). This is why I have included it in this package !

More information here : https://github.com/GuillaumePressiat/stringfix


To leave a comment for the author, please follow the link and comment on their blog: R posts.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)