Polarisation and Mobilisation indicators

[This article was first published on Quantifying Memory, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This blog post makes available a set of indicators discussed in a forthcoming edition of Digital Icons. In brief, the script takes a text input and calculates polarisation and mobilisation indexes based on the number of pronouns featured.

The hypothesised relationship between pronouns and polarisation is one discussed extensively by critical discourse analysts, social psychologists, and practitioners of critical linguistics
. The main innovation here is a suggestion that the selection of two sets of texts, divided either by focus, publishing house, political orientation, date, or similar, may exhibit varying degrees of engagedness with a topic.

Polarisation suggests there is a high incidence of ‘we’ – ‘they’ statements, a sign of othering. In brief, a lot of statements praising the self and criticising the other may be indicative of identity forming rhetoric. 

The mobilisation indicator measures the rate of references to the second person, and may point to topics associated with (political) mobilisation, especially if the source is blogs.

Scroll below the application for technical details, how to use, limitations, etc.

The application above is very much in beta. The following should be considered:
  1. The application has only been tested for Russian texts. English language support is experimental, under theorised, and really only included to give an idea to those not specialising on Russia.
  2. Samples of texts should be compared, not individual texts. A sample united by the aspect studied, but written by a range of authors, at different times, about different topics, etc will result in greatly reduced errors due to personal stylistic preferences (a fondness for rhetorical questions), genre (interviews, for instance, often contain a vastly larger number of personal pronouns), as well as text-length. If individual texts are compared, special care should be taken to examine the sentences identified in ‘examples’. 
  3. There is an option to limit texts to n number of sentences from a search term. This is a useful feature for larger datasets where polarisation around a theme/event/person is studied. For individual texts it is likely only to give insignificant results
  4. In the graphs, the black point is the sample analysed, the blue one is the reference category
  5. The x-axis is a scale from -100 to 100, where the extremes represent infinite difference from a reference point. I used the formula for the RSI index as a starting-point, but believe this is a generic type of index. 
  6. The presence of a linguistic feature (pronouns) does not necessarily mean the text(s) function in the hypothesised way. One should of course examine the fragments identified as being polarising to ascertain whether in fact they are. See the ‘examples’ tab for this.
  7. The literature discussing the merit of quantifying pronouns is discussed elsewhere. Suffice to say this is an innovation in method and accessibility, not in theory
  8. Results are likely to be insignificant for texts containing fewer than 10 pronouns and 1000 words.
  9. Results are significant when there is no overlap between the confidence intervals plotted, or when using the reference set, if the confidence interval does not cross 0.
  10. Significance is calculated with a binomial confidence interval (95%). This prevents (especially: short texts with few pronouns) emerging as false positives. 
  11. The reference category provided is based on a dataset I use in my thesis, consisting of upwards of 50,000 texts, drawn from the gamut of Russian newspapers over the last 12 years. They are not representative, however, as the search terms used to create the archive selected events that have memory potential, such as the Beslan tragedy, the transition from Communism, racially motivated killlings, etc. Consequently there is every likelihood that the reference values are higher than the average Russian newspaper text. The reference category is useful, though, because it is so very large; errors are consequently small, meaning significant results are possible. 
  12. The code is written in R and made available via Shiny. I am happy to share the script, though it is not at present polished sufficiently for me to want to open-source it. 
  13. Anyone may use the script, but please cite something to the effect of: Fredheim, R (2013) ‘Quantifying Polarisation in Media Coverage of the 2011-12 Protests in Russia’, Digital Icons [forthcoming]
  14. Please email me any bugs

To leave a comment for the author, please follow the link and comment on their blog: Quantifying Memory.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)