Text Mining with R: Top Keywords of the useR! 2016

June 24, 2016
By

(This article was first published on eoda english R news, and kindly contributed to R-bloggers)

From June 27-30 the international R user and developer community will meet in Stanford, California for the useR! 2016 Conference. Right in the heart of Silicon Valley, gripping presentations and talks will cover a broad range of topics from R-related computing issues to general statistical topics. In case you are wondering what the most popular topics will be at this year’s useR!, we can help you out.

Growing anticipation and curiosity drove us to a brief text mining analysis identifying the top keywords of the useR! 2016. We therefore examined the abstracts of the contributed talks (http://schedule.user2016.org/) in R and created two different word clouds illustrating their popularity and significance.

Word Cloud with the top Keywords of the useR! 2016
Unweighted Word Cloud with the top Keywords of the useR! 2016

The first word cloud is based only on frequency of mentions of specific terms. Obviously, R is one of the most frequently used words in the abstracts, which is hardly surprising for this kind of Event.

Unweighted Wordcloud with the Keywords of the useR! 2016
Weighted Wordcloud with the Keywords of the useR! 2016

The second word cloud additionally applied a tf-idf weighting in order to reflect the importance of a word by, roughly speaking taking into account the fraction of abstracts in which the term occurs. Therefore, the term R appears much smaller in the weighted word cloud than in the unweighted.

We also generated two bar charts illustrating the findings:

Top 10 most frequent terms from useR! 2016 abstracts.
Top 10 most frequent terms from useR! 2016 abstracts.
Top 10 most frequent terms from useR! 2016 abstracts (tf-idf weighted)
Top 10 most frequent terms from useR! 2016 abstracts (tf-idf weighted)

This text mining analysis provides a good overview of the top terms at the upcoming useR! 2016. The results might not be surprising. However, they might make you even more excited about the approaching start of this event.

If you want to take a look at the script and the data set used for the analysis, visit GitHub.

To leave a comment for the author, please follow the link and comment on their blog: eoda english R news.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)