Return a Vector of Each Word Found Before the End of a Sentence

April 6, 2017

(This article was first published on | R Language Programming, and kindly contributed to R-bloggers)

This little function returns the vector of each word found before the end of a sentence. I ended up writing this for a pet project to help with the babble function within the ngrams R package.

It can be used to find the best spot to terminate sentences from the resulting babbles and adjusted to fit your needs.

stops <- data.frame(table(termination_words))
stops <- stops[which(stops$Freq>10),]

The stringr package also has similar functionality, str_extract to work the way I wanted. Probably just me though.

To leave a comment for the author, please follow the link and comment on their blog: | R Language Programming. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)