Preparing RTextTools Beta Release for Catania 2011

(This article was first published on RTextTools: a machine learning library for text classification - Blog, and kindly contributed to R-bloggers)

Right now our development team is busy preparing a conference release of RTextTools for The 4th Annual Conference of the Comparative Policy Agendas Project at the University of Catania in Sicily. One of the key issues we’ve had thus far is memory consumption with very large datasets.

In the past week we’ve pushed out a slew of updates that allow the support vector machine and maximum entropy algorithms to run with low memory requirements, even on very large datasets. Unfortunately, not all the algorithms used in RTextTools support the changes we’ve made, so this leaves us with a two algorithm ensemble for low-memory classification. However, SVM and maxent tend to be the most accurate algorithms in our tests, meaning that a large ensemble isn’t necessary to get high consensus accuracy.

To leave a comment for the author, please follow the link and comment on their blog: RTextTools: a machine learning library for text classification - Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags:

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)