Hunspell 2.0: High-Performance Stemmer, Tokenizer, and Spell Checker for R

September 11, 2016

(This article was first published on rOpenSci Blog - R, and kindly contributed to R-bloggers)

A new version of the ropensci hunspell package has been released to CRAN. Hunspell is the spell checker library used by LibreOffice, OpenOffice, Mozilla Firefox, Google Chrome, Mac OS-X, InDesign, Opera, RStudio and many others. It provides a system for tokenizing, stemming and spelling in almost any language or alphabet. The R package exposes both the high-level spell-checker as well as low-level stemmers and tokenizers which analyze or extract individual words from various formats (text, html, xml, latex).

New Vignette

This new version now includes a beautiful vignette which gives an overview of the main functionality to get you started! It demonstrates the tokenizer, stemmer and spell-checker and has an example how to use the stemmer and tokenizer to create a word cloud from a large body of text.

Hunspell vignette

Installing and Updating

The package is most easily installed from CRAN:


Or to get the latest version from Github:


This package does not require any system dependencies (libhunspell is now bundled with the package).

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci Blog - R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)