qdap 1.1.0 Released on CRAN

February 23, 2014
By

(This article was first published on TRinker's R Blog » R, and kindly contributed to R-bloggers)

We’re very pleased to announce the release of qdap 1.1.0

logo

This is the fourth installment of the qdap package available at CRAN. Major development has taken place since the last CRAN update.

The qdap package automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse, including frequency counts of sentence types, words, sentence, turns of talk, syllable counts and other assorted analysis tasks. The package provides parsing tools for preparing transcript data but may be useful for many other natural language processing tasks. Many functions enable the user to aggregate data by any number of grouping variables providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text.

This version is a major overhaul of the qdap package. The word lists and dictionaries in qdap have been moved to qdapDictionaries. Additionally, many functions have been renamed with underscores instead of the former period separators. These changes break backward compatibility. Thus this is a major release (ver. 1.0.0). It is the general practice to deprecate functions within a package before removal, however, the number of necessary changes in light of qdap being relatively new to CRAN, made these changes sensible at this point.

To install use:

install.packages(“qdap”)

Some of the changes in version 1.1.0 include:


PACKAGE VIGNETTE

qdap gains an HTML package vignette to better explain the intended workflow and function use for the package. This is not currently a part of the build but can be accessed via:

http://htmlpreview.github.io/?https://github.com/trinker/qdap/blob/master/vignettes/qdap_vignette.html

tm PACKAGE COMPATABILITY

qdap 1.1.0 attempts to gain compatability with the tm package. This enables data structures from tm to be utilized with qdap functions and conversely qdap data structures to be utilized with functions intended for tm data sets. Some of the following changes have been made to gain tm compatability:

  • tdm and dtm are now truly compatable with the tm package. tdm and dtm produce outputs of the class "TermDocumentMatrix" and "DocumentTermMatrix" respectively. This change (coupled with the renaming of stopwords to rm_stopwords) should make the two packages logical companions and further extend the qdap package to integrate with the many packages that already handle "TermDocumentMatrix" and "DocumentTermMatrix".
  • tm2qdap a function to convert "TermDocumentMatrix" and "DocumentTermMatrix" to a wfm added to allow easier integration with the tm package.
  • apply_as_tm a function to allow functions intended to be used on the tm package’s TermDocumentMatrix to be applied to a wfm object.
  • tm_corpus2df and df2tm_corpus added to convert a tm package corpus to a dataframe for use in qdap or vice versa.

NEW FEATURES

  • hash_look (and %ha%) a counterpart to hash added to allow quick access to a hash table. Intended for use within functions or multiple uses of the same hash table, whereas lookup is intended for a single external (non-function) use which is more convenient though could be slower.
  • word_cor added to find words within grouping variables that are associated based on correlation.
  • dispersion_plot added to enable viewing of word dispersion through discourse.
  • word_proximity added to compliment dispersion_plot and word_cor functions. word_proximity gives the average distance between words in the unit of sentences.
  • boolean_search, a Boolean term search function, added to allow for indexed searches of Boolean terms.
  • wfm now uses mtabulate and is ~10x faster.

PLOTTING

Several Plotting Functions have been added to qdap. Many functions pick up a corresponding plotting method as well.


This version of qdap has seen some exciting changes. We look forward to continued development. In the future we plan to:

  • Further develop the new_report function to better incorporate the reports package and smooth workflow.
  • Incorporate the dplyr package to gain speed boosts in some of qdap’s functions.

For a complete list of changes see qdap’s NEWS.md

Development Version
github


To leave a comment for the author, please follow the link and comment on his blog: TRinker's R Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.