Basic text string functions in R

May 18, 2015
By

(This article was first published on lukemiller.org » R-project, and kindly contributed to R-bloggers)

 

To get the length of a text string (i.e. the number of characters in the string):

nchar()

Using length() would just give you the length of the vector containing the string, which will be 1 if the string is just a single string.

To get the position of a regular expression match(es) in a text string x:

pos = regexpr('pattern', x) # Returns position of 1st match in a string
pos = gregexpr('pattern', x) # Returns positions of every match in a string

To get the position of a regular expression match in a vector x of text strings (this returns the index of the matching string in the vector, not the position of the match in the text string itself):

pos = grep('pattern', x)

To extract part of a text string based on position in the text string, where first and last are the locations in the text string, usually found by the regexpr() function:

keep = substr(x, first, last)

To replace part of a text string with some other text:

sub('pattern', replacement, input) # Changes only the 1st pattern match per string
gsub('pattern', replacement, input) # Changes every occurrence of a pattern match

The pattern argument in the various regular expression functions can include include regular expressions enclosed in square brackets. See ?regex for the explanation of regular expressions. For example, to make a pattern that matches any numerical digit, you could use '[0-9]' as the pattern argument. You may also use several predefined patterns such as '[:digit:]', which also finds any numerical digit in the string, same as the [0-9] pattern.

File name stuff

To get a list of file names (and paths) in a directory:

fnames = dir("./path/to/my/data", full.names=TRUE)

To extract just the filename from a full path:

fname = basename(path)

To extract the directory path from a file path:

directory = dirname(path)

If you have a text string assigned to a variable in the R workspace, and you want to parse it using various other functions, you can use the textConnection() function to feed your string to the other function.

mydataframe = read.csv(textConnection(myString)) # If myString contained comma-separated-values, this would convert them to a data frame.

To leave a comment for the author, please follow the link and comment on their blog: lukemiller.org » R-project.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)