stringfix : new R package for string manipulation in a %>% way
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I usually write around here in french and mainly report on French Hospitals data managment and the statistical tasks they imply. As today’s post is about a new package I have created, I’ll be writing in english. The package is called stringfix
because it uses infix operators to manipulate character strings.
Introduction
In Python, the operator +
is used to paste two character strings together. For example: 'Hello ' + 'world'
gives 'Hello World'
. For that matter, building sentences with words and arithmetic symbols seems a very nice way to write. In R, the paste function requires parenthesis in order to be computed. Therefore the use of consecutive functions can make it hard to understand.
+
is a nice operator, and we can use it in R almost as it is used in Python by creating an infix operator.
<span class="n">`%+%`</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">){</span><span class="w">
</span><span class="n">paste0</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>
While a ggplot function has already the same name, it is used to override data in a ggplot
call and not for pasting character strings, see here. When loading tidyverse, the same ggplot function is called, preventing us from using paste0’
s %+%
. Otherwise, you can find a hint of character string pasting in the Advanced R book.
In order to create a toolbox around paste0
’s %+%
, I started collecting some other infix functions for character strings manipulation. The main question was: which functions with a right to left call that I use really often could be reordered in a %>%
code. Here is the little family I have since build on : paste, grepl, substring, count, padding. The goal of this package is to use stringr or base functions in backend as a start for an alternative character string manipulation in R.
This package is still at its early begining (kind of a draft for me!) but I thought some other people would enjoy it and may even wish to contribute.
Presentation
<span class="s2">"In a manner of coding, I just want to say..."</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="s2">"Nothing."</span><span class="w">
</span><span class="c1">#>[1] "In a manner of coding, I just want to say... Nothing."</span><span class="w">
</span>
Examples
paste
<span class="s1">'Hello '</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'world'</span><span class="w">
</span><span class="c1">#> [1] "Hello world"</span><span class="w">
</span><span class="s1">'Your pastas taste like '</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'%>%'</span><span class="w">
</span><span class="c1">#> [1] "Your pastas taste like %>%"</span><span class="w">
</span><span class="s1">'coco'</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'bolo'</span><span class="w">
</span><span class="c1">#> [1] "cocobolo"</span><span class="w">
</span>
<span class="s1">'Hello'</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="s1">'world'</span><span class="w">
</span><span class="c1">#> [1] "Hello world"</span><span class="w">
</span><span class="s1">'Your pastas taste like'</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="s1">'%>%'</span><span class="w">
</span><span class="c1">#> [1] "Your pastas taste like %>%"</span><span class="w">
</span><span class="s1">'Hello'</span><span class="w"> </span><span class="o">%,%</span><span class="w"> </span><span class="s1">'world...'</span><span class="w">
</span><span class="c1">#> [1] "Hello, world..."</span><span class="w">
</span><span class="s1">'Your pastas taste like '</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'%>%...'</span><span class="w"> </span><span class="o">%,%</span><span class="w"> </span><span class="s1">'or %>>%...'</span><span class="w">
</span><span class="c1">#> [1] "Your pastas taste like %>%..., or %>>%..."</span><span class="w">
</span>
grepl
Case sensitive
<span class="s1">'pig'</span><span class="w"> </span><span class="o">%g%</span><span class="w"> </span><span class="s1">'The pig is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="s1">'Pig'</span><span class="w"> </span><span class="o">%g%</span><span class="w"> </span><span class="s1">'The pig is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] FALSE</span><span class="w">
</span>
Case insensitive (ignore.case)
<span class="s1">'pig'</span><span class="w"> </span><span class="o">%gic%</span><span class="w"> </span><span class="s1">'The pig is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span><span class="s1">'PIG'</span><span class="w"> </span><span class="o">%gic%</span><span class="w"> </span><span class="s1">'The PiG is in the cornfield'</span><span class="w">
</span><span class="c1">#> [1] TRUE</span><span class="w">
</span>
substring
<span class="s1">'NFKA008'</span><span class="w"> </span><span class="o">%s%</span><span class="w"> </span><span class="s1">'1.4'</span><span class="w">
</span><span class="c1">#> [1] "NFKA"</span><span class="w">
</span><span class="s1">'NFKA008'</span><span class="w"> </span><span class="o">%s%</span><span class="w"> </span><span class="m">.4</span><span class="w">
</span><span class="c1">#> [1] "NFKA"</span><span class="w">
</span><span class="s1">'where is'</span><span class="w"> </span><span class="o">% %</span><span class="w"> </span><span class="p">(</span><span class="s1">'the pig is in the cornfield'</span><span class="w"> </span><span class="o">%s%</span><span class="w"> </span><span class="s1">'1.7'</span><span class="p">)</span><span class="w"> </span><span class="o">%+%</span><span class="w"> </span><span class="s1">'?'</span><span class="w">
</span><span class="c1">#> [1] "where is the pig?"</span><span class="w">
</span>
count
<span class="n">fruit</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"apple"</span><span class="p">,</span><span class="w"> </span><span class="s2">"banana"</span><span class="p">,</span><span class="w"> </span><span class="s2">"pear"</span><span class="p">,</span><span class="w"> </span><span class="s2">"pineapple"</span><span class="p">)</span><span class="w">
</span><span class="s2">"a"</span><span class="w"> </span><span class="o">%count%</span><span class="w"> </span><span class="n">fruit</span><span class="w">
</span><span class="c1">#> [1] 1 3 1 1</span><span class="w">
</span><span class="nf">c</span><span class="p">(</span><span class="s2">"a"</span><span class="p">,</span><span class="w"> </span><span class="s2">"b"</span><span class="p">,</span><span class="w"> </span><span class="s2">"p"</span><span class="p">,</span><span class="w"> </span><span class="s2">"p"</span><span class="p">)</span><span class="w"> </span><span class="o">%count%</span><span class="w"> </span><span class="n">fruit</span><span class="w">
</span><span class="c1">#> [1] 1 1 1 3</span><span class="w">
</span>
pad, lpad and rpad
<span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="s1">'0.5'</span><span class="w">
</span><span class="c1">#> [1] "00005"</span><span class="w">
</span><span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="m">.5</span><span class="w">
</span><span class="c1">#> [1] "00005"</span><span class="w">
</span><span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="s1">'.5'</span><span class="w">
</span><span class="c1">#> [1] " 5"</span><span class="w">
</span><span class="m">5</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="s1">'2.5'</span><span class="w">
</span><span class="c1">#> [1] "22225"</span><span class="w">
</span><span class="s1">'é'</span><span class="w"> </span><span class="o">%lpad%</span><span class="w"> </span><span class="s1">'é.5'</span><span class="w">
</span><span class="c1">#> [1] "ééééé"</span><span class="w">
</span>
names of tibbles : tolower and toupper
I have added two functions that I use really often.
<span class="n">library</span><span class="p">(</span><span class="n">magrittr</span><span class="p">)</span><span class="w">
</span><span class="n">iris</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">toupper_names</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">head</span><span class="w">
</span><span class="c1">#> SEPAL.LENGTH SEPAL.WIDTH PETAL.LENGTH PETAL.WIDTH SPECIES</span><span class="w">
</span><span class="c1">#> 1 5.1 3.5 1.4 0.2 setosa</span><span class="w">
</span><span class="c1">#> 2 4.9 3.0 1.4 0.2 setosa</span><span class="w">
</span><span class="c1">#> 3 4.7 3.2 1.3 0.2 setosa</span><span class="w">
</span><span class="c1">#> 4 4.6 3.1 1.5 0.2 setosa</span><span class="w">
</span><span class="c1">#> 5 5.0 3.6 1.4 0.2 setosa</span><span class="w">
</span><span class="c1">#> 6 5.4 3.9 1.7 0.4 setosa</span><span class="w">
</span>
Finally, I also wanted to outline that the function from the rmngb package : %out%
: negation of %in%
can be very useful to avoid typing ! x %in% y
(you can just type x %out% y
instead). This is why I have included it in this package !
More information here : https://github.com/GuillaumePressiat/stringfix
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.