kgrams v0.1.2 on CRAN

[This article was first published on Valerio Gherardi, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Version v0.1.2 of my R package kgrams was just accepted by CRAN. This package provides tools for training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.

Short demo

# Get k-gram frequency counts from Shakespeare's "Much Ado About Nothing"
freqs <- kgram_freqs(kgrams::much_ado, N = 4)

# Build modified Kneser-Ney 4-gram model, with discount parameters D1, D2, D3.
mkn <- language_model(freqs, smoother = "mkn", D1 = 0.25, D2 = 0.5, D3 = 0.75)

# Sample sentences from the language model at different temperatures
sample_sentences(model = mkn, n = 3, max_length = 10, t = 1)
[1] "i have studied eight or nine truly by your office [...] (truncated output)"
[2] "ere you go : <EOS>"                                                        
[3] "don pedro welcome signior : <EOS>"                                         
sample_sentences(model = mkn, n = 3, max_length = 10, t = 0.1)
[1] "i will not be sworn but love may transform me [...] (truncated output)" 
[2] "i will not fail . <EOS>"                                                
[3] "i will go to benedick and counsel him to fight [...] (truncated output)"
sample_sentences(model = mkn, n = 3, max_length = 10, t = 10)
[1] "july cham's incite start ancientry effect torture tore pains endings [...] (truncated output)"   
[2] "lastly gallants happiness publish margaret what by spots commodity wake [...] (truncated output)"
[3] "born all's 'fool' nest praise hurt messina build afar dancing [...] (truncated output)"          


Overall Software Improvements

  • The package’s test suite has been greatly extended.
  • Improved error/warning conditions for wrong arguments.
  • Re-enabled compiler diagnostics as per CRAN policy (#19)

API Changes

  • verbose arguments now default to FALSE.
  • probability(), perplexity() and sample_sentences() are restricted to accept only language_model class objects as their model argument.

New features

  • as_dictionary(NULL) now returns an empty dictionary.

Bug Fixes

  • Fixed bug causing .preprocess and .tknz_sent arguments to be ignored in process_sentences().
  • Fixed previously wrong defaults for max_lines and batch_size arguments in kgram_freqs.connection().
  • Added print method for class dictionary.
  • Fixed bug causing invalid results in dictionary() with batch processing and non-trivial size constraints on vocabulary size.


  • Maintainer’s email updated

To leave a comment for the author, please follow the link and comment on their blog: Valerio Gherardi. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)