Version v0.1.2 of my R package kgrams was just accepted by CRAN. This package provides tools for training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.
library(kgrams) # Get k-gram frequency counts from Shakespeare's "Much Ado About Nothing" freqs <- kgram_freqs(kgrams::much_ado, N = 4) # Build modified Kneser-Ney 4-gram model, with discount parameters D1, D2, D3. mkn <- language_model(freqs, smoother = "mkn", D1 = 0.25, D2 = 0.5, D3 = 0.75) # Sample sentences from the language model at different temperatures set.seed(840) sample_sentences(model = mkn, n = 3, max_length = 10, t = 1)  "i have studied eight or nine truly by your office [...] (truncated output)"  "ere you go : <EOS>"  "don pedro welcome signior : <EOS>" sample_sentences(model = mkn, n = 3, max_length = 10, t = 0.1)  "i will not be sworn but love may transform me [...] (truncated output)"  "i will not fail . <EOS>"  "i will go to benedick and counsel him to fight [...] (truncated output)" sample_sentences(model = mkn, n = 3, max_length = 10, t = 10)  "july cham's incite start ancientry effect torture tore pains endings [...] (truncated output)"  "lastly gallants happiness publish margaret what by spots commodity wake [...] (truncated output)"  "born all's 'fool' nest praise hurt messina build afar dancing [...] (truncated output)"
Overall Software Improvements
- The package’s test suite has been greatly extended.
- Improved error/warning conditions for wrong arguments.
- Re-enabled compiler diagnostics as per CRAN policy (#19)
verbosearguments now default to
sample_sentences()are restricted to accept only
language_modelclass objects as their
as_dictionary(NULL)now returns an empty
- Fixed bug causing
.tknz_sentarguments to be ignored in
- Fixed previously wrong defaults for
- Added print method for class
- Fixed bug causing invalid results in
dictionary()with batch processing and non-trivial size constraints on vocabulary size.
- Maintainer’s email updated