Kaleidoscope IIb (useR! 2011)

August 17, 2011

L Collingwood – RTextTools

RTextTools. A machine learning library for automated text classification. This package builds on previous packages such as tm and random forests. Use case: undergrad labels congressional bills but then quits. Using the previously labelled data, automatically classify the remaining documents. The speaker gave a nice overview of machine learning techniques, but I was familiar with them so didn’t bother making notes.


  1. Read data;
  2. Missed opps;
  3. Create Corpus;
  4. Train Models – SVM, SLDA, TREE, etc;
  5. Classify models;
  6. Analyze data.

Jason Waddel – The Role of R in Lab Automation

License: free as in free beer and speech!

Summary: a scientist repeats the same experiment multiple times. How can we automate analysis.

R service bus allows a scientist to email/upload data and the results are automatically generated.

High level view

Various inputs such as pop, xml, REST WS. Each input is added to the queue. A pool of R servers handles the job. A simple configuration file handles the set-up.


