plot textual differences in Shiny

February 21, 2013

(This article was first published on Quantifying Memory, and kindly contributed to R-bloggers)

Wordclouds such as Wordle are pretty rubbish, so I thought I’d try to make a better one, one that actually produces (statistically) meaningful results. I was so happy with the outcome I decided to make it interactive, so go on, have a play!

Compare any two files texts (turns out file uploading in Shiny is pretty experimental/dysfunctional) , and graphically map differences between them. The application will stem the file, remove stop words, and calculate statistical significance, all in a few clicks. Using the controls below you can also change the text size, plot title, the positioning of the terms (to avoid overlap), add transparency, and change the number of words plotted.

The sample image included to the left shows differences between my undergraduate thesis about Richard Pipes as a figure or ridicule in Rusian media (on the left) and my mphil theses about Katyn in Polish and Russian media (on the right). I think the plot makes the differences in emphasis pretty obvious. The words in light blue in the middle are terms featuring strongly in both texts and which are not significantly more present in one or the other.

I’ve presented the code and the logic behind the application elsewhere, so here I include only basic instructions: select two files to compare. Comparisons work best for medium sized files – too small and there will be no differences, too large and processing time will become a bottleneck. If trying to do anything big I strongly recommend compiling the R script locally.

Any language should work, but you may need to find your own stoplist (and stem it!) to get meaningful results. My Russian stop list may be downloaded from here. UPDATE: the Russian stoplist has been hardcoded into the app. Native support for English and I think German also exists, but for other languages you will need to recompile the programme with a custom made stoplist.

I’ve embedded the app below, but a more userfriendly version can be acccessed here

UPDATE: file upload is not working at the moment, so text needs to be pasted in. This will only work for small to medium size documents.

To leave a comment for the author, please follow the link and comment on their blog: Quantifying Memory. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)