German election: Election promises visualized with R

September 13, 2013

(This article was first published on eoda, R und Datenanalyse » eoda english R news, and kindly contributed to R-bloggers)

Data analysis is mostly focused on structured and standardized data, e. g. data from data bases, because these data can be used easily for analysis. Nevertheless even unstructured data offer chances to generate advantages. Concrete applications like content analysis or sentiment detection are discussed more and more frequently.

Of course, there are still limits to the possibilities of qualitative data analysis. The automated recognition of moods is limited when it comes to ambiguous statements. But the unlimited availability of digital texts and documents shows that analysis of unstructured data is justified and useful. Unstructured data does exist in a plenty of forms. Examples could be e-mail histories as well as scientific papers. The analyses of those unstructured texts are complex through extensive data volume, differing formats and different types of problems.

The free statistical programming language R is one of the leading solutions for this kind of problems. R offers almost unlimited possibilities for every kind of statistical problem. For example the additional package tm provides functions that allow the management of text documents and facilitates the use of heterogeneous text formats and is therefore a useful application for text mining tasks. A multitude of text formats like e-mails, RSS feeds and many other formats (HTML, CSV, PDF, etc.) can be read in to R. The data structure as well as the algorithms can be adjusted according to personal needs, because tm’s developers created a modular concept that supports integration, transformation and filtering options. These options allow the concrete filtration of texts according to determined criteria. The advantage of R in this case is the possibility to use the gained results for further analysis using R as a statistical language.

The following graphic “election promises Germany 2013 freqent words” has been created to visualize possibilities of data mining with R. The upcoming Bundestag election in Germany has been chosen as an interesting example. Frequent words have been filtered from the election promises of the five most popular German parties to demonstrate R’s power to cope with unstructured data. The results show how often the frequent words have been used in the parties’ election promises and can be interpreted as their special topics of interest, depending on the section’s color. The dark red sections show that the word appeared very frequently in the parties’ election promise while lighter colored sections show that these words have been used less often.

election promises Germany 2013 freqent words

eoda’s R-Academy offers a course named “qualitative data analysis” from November 18th to November 19th that will broach the issue of text mining as one method of qualitative data analysis.

To leave a comment for the author, please follow the link and comment on their blog: eoda, R und Datenanalyse » eoda english R news. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)