How to Conditionally Remove Character of a Vector Element in R

[This article was first published on R – The Hack-R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have (sometimes incomplete) data on addresses that looks like this:

<span class="pln">data </span><span class="pun"><-</span><span class="pln"> c</span><span class="pun">(</span><span class="str">"1600 Pennsylvania Avenue, Washington DC"</span><span class="pun">,</span> 
          <span class="str">",Siem Reap,FC,"</span><span class="pun">,</span> <span class="str">"11 Wall Street, New York, NY"</span><span class="pun">,</span> <span class="str">",Addis Ababa,FC,"</span><span class="pun">)</span>

where I need to remove the first and/or last character if either one of them are a comma.

Avinash Raj was able to help me with this on S.O. and the question turned out to be a popular one, so I’ll show the solution here:

<span class="pun">></span><span class="pln"> data </span><span class="pun"><-</span><span class="pln"> c</span><span class="pun">(</span><span class="str">"1600 Pennsylvania Avenue, Washington DC"</span><span class="pun">,</span> 
<span class="pun">+</span>           <span class="str">",Siem Reap,FC,"</span><span class="pun">,</span> <span class="str">"11 Wall Street, New York, NY"</span><span class="pun">,</span> <span class="str">",Addis Ababa,FC,"</span><span class="pun">)</span>
<span class="pun">></span><span class="pln"> gsub</span><span class="pun">(</span><span class="str">"(?<=^),|,(?=$)"</span><span class="pun">,</span> <span class="str">""</span><span class="pun">,</span><span class="pln"> data</span><span class="pun">,</span><span class="pln"> perl</span><span class="pun">=</span><span class="lit">TRUE</span><span class="pun">)</span>
<span class="pun">[</span><span class="lit">1</span><span class="pun">]</span> <span class="str">"1600 Pennsylvania Avenue, Washington DC"</span>
<span class="pun">[</span><span class="lit">2</span><span class="pun">]</span> <span class="str">"Siem Reap,FC"</span>                           
<span class="pun">[</span><span class="lit">3</span><span class="pun">]</span> <span class="str">"11 Wall Street, New York, NY"</span>           
<span class="pun">[</span><span class="lit">4</span><span class="pun">]</span> <span class="str">"Addis Ababa,FC"</span>

Pattern explanation:

  • (?<=^), In regex (?<=) called positive look-behind. In our case it asserts What precedes the comma must be a line start ^. So it matches the starting comma.
  • | Logical OR operator usually used to combine(ie, ORing) two regexes.
  • ,(?=$) Lookahead aseerts that what follows comma must be a line end $. So it matches the comma present at the line end.

Rlogo

To leave a comment for the author, please follow the link and comment on their blog: R – The Hack-R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)