kgrams v0.1.2 on CRAN

Posted on November 11, 2021 by vgherard in R bloggers | 0 Comments

[This article was first published on Valerio Gherardi, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Summary

Version v0.1.2 of my R package kgrams was just accepted by CRAN. This package provides tools for training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.

Short demo

library(kgrams)
# Get k-gram frequency counts from Shakespeare's "Much Ado About Nothing"
freqs <- kgram_freqs(kgrams::much_ado, N = 4)

# Build modified Kneser-Ney 4-gram model, with discount parameters D1, D2, D3.
mkn <- language_model(freqs, smoother = "mkn", D1 = 0.25, D2 = 0.5, D3 = 0.75)

# Sample sentences from the language model at different temperatures
set.seed(840)
sample_sentences(model = mkn, n = 3, max_length = 10, t = 1)
[1] "i have studied eight or nine truly by your office [...] (truncated output)"
[2] "ere you go : <EOS>"                                                        
[3] "don pedro welcome signior : <EOS>"                                         
sample_sentences(model = mkn, n = 3, max_length = 10, t = 0.1)
[1] "i will not be sworn but love may transform me [...] (truncated output)" 
[2] "i will not fail . <EOS>"                                                
[3] "i will go to benedick and counsel him to fight [...] (truncated output)"
sample_sentences(model = mkn, n = 3, max_length = 10, t = 10)
[1] "july cham's incite start ancientry effect torture tore pains endings [...] (truncated output)"   
[2] "lastly gallants happiness publish margaret what by spots commodity wake [...] (truncated output)"
[3] "born all's 'fool' nest praise hurt messina build afar dancing [...] (truncated output)"

NEWS

Overall Software Improvements

The package’s test suite has been greatly extended.
Improved error/warning conditions for wrong arguments.
Re-enabled compiler diagnostics as per CRAN policy (#19)

API Changes

verbose arguments now default to FALSE.
probability(), perplexity() and sample_sentences() are restricted to accept only language_model class objects as their model argument.

New features

as_dictionary(NULL) now returns an empty dictionary.

Bug Fixes

Fixed bug causing .preprocess and .tknz_sent arguments to be ignored in process_sentences().
Fixed previously wrong defaults for max_lines and batch_size arguments in kgram_freqs.connection().
Added print method for class dictionary.
Fixed bug causing invalid results in dictionary() with batch processing and non-trivial size constraints on vocabulary size.

Other

Maintainer’s email updated

To leave a comment for the author, please follow the link and comment on their blog: Valerio Gherardi.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

kgrams v0.1.2 on CRAN

Summary

Short demo

NEWS

Overall Software Improvements

API Changes

New features

Bug Fixes

Other

Related

Summary

Short demo

NEWS

Overall Software Improvements

API Changes

New features

Bug Fixes

Other

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)