Blog Archives

Classifying Breast Cancer as Benign or Malignant Using RTextTools

RTextTools has largely been used for topic classification in the social sciences. However, recent discussions with researchers at various universities have demonstrated that the package can be applied to a host of problems in the natural sciences as well.One such application is using text classification to identify breast cancer masses as benign or malignant. Using the Wisconsin Diagnostic Breast Cancer...

Read more »

RTextTools Short Course Materials

Attached are some of the materials from the recent short course at UNC. For confidential reasons, we are unable to present all of the materials, but this is enough to get someone started. 1. Lecture; 2. Intro to R; 3. NY Times; 4.

Read more »

Successful Two Day Workshop at UNC-Chapel Hill

This week the Odum Institute at UNC held a two day short course on text classification with RTextTools. The workshop, led by Loren Collingwood, covered the basics of content analysis, supervised learning and text classification, introduction to R, and how to use RTextTools. Participants brought in their own data on the second day, which the instructor helped them classify....

Read more »

RTextTools v1.3.5: Saving models, text labels, and a game plan for 2012

RTextTools v1.3.5 addresses some key concerns that have been raised in recent months. Many of the algorithms used in RTextTools require that any new data presented to a trained classifier contain the same features as the original document-term matrix. Since this rarely (if ever) happens in the real world, I have added an originalMatrix parameter to the create_matrix() function...

Read more »

RTextTools v1.3.2 Released

RTextTools was updated to version 1.3.2 today, adding support for n-gram token analysis, a faster maximum entropy algorithm, and numerous bug fixes. The source code has been synced with the Google Code repository, so please feel free to check out a copy and add your own features!With the core feature set of RTextTools finalized, the next major releas

Read more »

The Problems with Pairing R + Java

A core focus of the RTextTools project has been to make the package as accessible and user-friendly as possible. In its early iterations, the package contained dependencies such as RWeka, openNLP, and

Read more »

Getting Started with Latent Dirichlet Allocation using RTextTools + topicmodels

RTextTools bundles a host of functions for performing supervised learning on your data, but what about other methods like latent Dirichlet allocation? With some help from the topicmodels package, we can get started with LDA in just five steps. Text in

Read more »

RTextTools v1.3 Released + Rstem Now Available on CRAN

RTextTools v1.3 was released on August 21, and the package binaries are now available on CRAN. This update fixes a major bug with the stemmers, and it is highly recommended you upgrade to the latest version. Other changes include optimization of existing functions and improvements to the documentation.Additionally, Duncan Temple Lang has graciously

Read more »

RTextTools v1.2 Available on CRAN + useR! 2011 Kaleidoscope Session

RTextTools v1.2 was released today and we're pleased to announce that the package is finally available on CRAN. Additionally, this update brings minor changes to the API, improvements to the GLMNET algorithm, and more comprehensive documentation. Get started by following our installatio

Read more »

Amazon Machine Image Created With RTextTools Pre-installed

We recently created an AMI for Amazon's EC2 cloud computing service. Users with AWS accounts can access the public AMI by searching ami-817eb8e8. The AMI is based off of Drew Conway's excellent AMI, but with R 2.13 loaded and RTextTools and

Read more »