Preparing RTextTools Beta Release for Catania 2011

[This article was first published on RTextTools: a machine learning library for text classification - Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Right now our development team is busy preparing a conference release of RTextTools for The 4th Annual Conference of the Comparative Policy Agendas Project at the University of Catania in Sicily. One of the key issues we’ve had thus far is memory consumption with very large datasets.

In the past week we’ve pushed out a slew of updates that allow the support vector machine and maximum entropy algorithms to run with low memory requirements, even on very large datasets. Unfortunately, not all the algorithms used in RTextTools support the changes we’ve made, so this leaves us with a two algorithm ensemble for low-memory classification. However, SVM and maxent tend to be the most accurate algorithms in our tests, meaning that a large ensemble isn’t necessary to get high consensus accuracy.

To leave a comment for the author, please follow the link and comment on their blog: RTextTools: a machine learning library for text classification - Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)