I recently needed to stem every word in a block of text i.e. reduce each word to a root form.


The stemmer I was using would only stem the last word in each block of text e.g.


wordStem('walk walks walked walking walker walkers', language = 'en')
# [1] 'walk walks walked walking walker walk';


I wrote a function which splits a block of text into individual words, stems each word, and then recombines the words together into a block of text

stem_text<- function(text, language = "porter", mc.cores = 1) {
  # stem each word in a block of text
  stem_string <- function(str, language) {
    str <- strsplit(x = str, split = "\s")
    str <- wordStem(unlist(str), language = language)
    str <- paste(str, collapse = " ")
  # stem each text block in turn
  x <- mclapply(X = text, FUN = stem_string, language, mc.cores = mc.cores)
  # return stemed text blocks

This works under the assumptions that the text only contains text and whitespace (i.e. it has been appropriately pre-processed).

# Blocks of text
sentences <- c('walk walks walked walking walker walkers',
               'Never ignore coincidence unless of course you are busy In which case always ignore coincidence')

# Stem blocks of text
stem_text(sentences, language = 'en', mc.cores = 2)

# [1] 'walk walk walk walk walker walker';                                                
# [2] 'Never ignor coincid unless of cours you are busi In which case alway ignor coincid'

