Finding the dramatic arc of novels with sentiment analysis

February 6, 2015

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Sentiment analysis has been widely used to infer the mood of customers in emails, tweets and other short communications. The base assumption is that the sentiment is a fixed value: the email is either angry or happy; positive or negative. But in longer writings like a novel, we naturally expect the sentiment to vary over time. Can we apply sentiment analysis over the course of a long text, and thereby see the dramatic arc of a story as it flows from comedy to tragedy and maybe back again?

That's what Matthew Jockers has done with his R package "syuzhet". Inspired by the Russian formalist Vladimir Propp, the package analyzes the sentences of a novel and "reveals the emotional and affectual shifts that serve as proxies for the narrative movement between conflict and conflict resolution". Once that analysis is done, you can plot how the sentiment of the writing changes page by page. Here, for example is the chart for Oscar Wilde's Picture of Dorian Grey:

Dorian grey

Matthew describes how the curve fits to the plot of the novel: 

Young Stephen reaches a low point during and just after the sermon on hell which occurs midway through the narrative. Dorian’s life takes a dark turn as the reality of the portrait becomes apparent.

You can try the analysis yourself in R. All you need is the text of a novel as an R character vector (an easy way is to point the function get_text_as_string to a text file from Project Gutenburg.) The package vignette is a good place to start, and Lincoln Mullen provides several worked examples. Note that you do need Java installed to run the analysis functions.

For more on the methodology, read Matthew's blog post linked below.

Matthew L. Jockers: Revealing Sentiment and Plot Arcs with the Syuzhet Package

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)