Few people expect politicians to write every word they utter themselves; reliance on speechwriters and spokepersons is a long-established political practice. Still, it's interesting to know which statements are truly the politician's own words, and which are driven primarily by advisors or influencers.
Recently, David Robinson established a way of figuring out which tweets from Donald Trump's Twitter account came from him personally, as opposed to from campaign staff, whcih he verified by comparing the sentiment of tweets from Android vs iPhone devices. Now, Ali Arsalan Kazmi has used stylometric analysis to investigate the provenance of speeches by the Prime Minister of Pakistan.
By looking at the aspects of linguistic style (word/sentence length, frequency of word pairings, use of punctuation, etc.) of the speeches of the Prime Minister Nawaz Sharif, Ali found suggestions of at least 2 authors (and possibly more) behind the speeches. This is particularly apparent in this consensus network of appearances of 4-character sequences in speeches, which divides them into two clusters (of possibly differing authorship).
Ali used R and several packages to perform the analysis. These included the openNLP package to extract attributes from the speech data, the stylo package for stylometric analysis, the fpc package for the clustering, and the igraph package to visualize the clusters. The complete R script used for the analysis is available on Github.
For an overview of the analysis, check out this slide presentation by Ali, and for the complete details take a look at the blog post linked below.
A Blog On Data Analytics: How many Authors does the Prime Minister have for his speeches: A Stylometric Analysis