The Knoxville R Users Group is presenting a workshop on text analysis using R by Bob Muenchen. The workshop is free and open to the public. You can join the group at https://www.meetup.com/Knoxville-R-Users-Group. A description of the workshop follows.
R for Text Analysis
When analyzing text using R, it’s hard to know where to begin. There are 37 packages available and there is quite a lot of overlap in what they can do. This workshop will demonstrate how to do three popular approaches: dictionary-based content analysis, latent semantic analysis, and latent Dirichlet allocation. We will spend much of the time on the data preparation steps that are important to all text analysis methods including data acquisition, word stemming/lemmatization, removal of punctuation and other special characters, phrase discovery, tokenization, and so on. While the examples will focus on the automated extraction of topics in the text files, we will also briefly cover the analysis of sentiment (e.g. how positive is customer feedback?) and style (who wrote this? are they telling the truth?)
The results of each text analysis approach will be the topics found, and a numerical measure of each topic in each document. We will then merge that with numeric data and do analyses combining both types of data.
The R packages used include quanteda, lsa, topicmodels, tidytext and wordcloud; with brief coverage of tm and SnowballC. While the workshop will not be hands-on due to time constraints, the programs and data files will be available afterwards.
Where: University of Tennessee Humanities and Social Sciences Building, room 201. If the group gets too large, the location may move and a notice will be sent to everyone who RSVPs on Meetup.com or who registers at the UT workshop site below. You can also verify the location the day before via email with Bob at [email protected].
When: 9:05-12:05 Friday 1/27/17
Prerequisite: R language basics
Members of UT Community register at: http://workshop.utk.edu under Researcher Focused
Members of other groups please RSVP on your respective sites so I can bring enough handouts.