RTextTools v1.3.5: Saving models, text labels, and a game plan for 2012

(This article was first published on RTextTools: a machine learning library for text classification - Blog, and kindly contributed to R-bloggers)

RTextTools v1.3.5 addresses some key concerns that have been raised in recent months. Many of the algorithms used in RTextTools require that any new data presented to a trained classifier contain the same features as the original document-term matrix. Since this rarely (if ever) happens in the real world, I have added an originalMatrix parameter to the create_matrix() function that adjusts new document-term matrices to contain the same terms as the original training matrix. Although this is a rather quirky work-around, it enables users to save trained models and classify new data easily. Example scripts are available in the /inst/examples/ directory of the RTextTools source code.

Since its introduction at the 2011 Comparative Agendas Project Conference in Catania, Italy, the RTextTools team has refined the API and implemented a number of features. Some of these features include n-gram analysis, text labels, comprehensive analytics, and a streamlined interface. Our plan for the year ahead includes a major overhaul of the nine algorithms to facilitate low-memory ensemble classification. However, this goal involves more than just the RTextTools team; it requires the R machine learning community to strive for efficient supervised learning algorithms. Many R packages do not utilize compressed sparse matrices, and therefore are limited in their applications for large-N data-sets. Therefore, we aim to promote efficient practices by package developers and write several implementations of our own to push the capabilities of statistical computing in R.

Thank you for all your feedback and support as we look forward to another productive year in 2012!

To leave a comment for the author, please follow the link and comment on their blog: RTextTools: a machine learning library for text classification - Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

plotly webpage

dominolab webpage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training





CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)