Sentiment analysis has been widely used to infer the mood of customers in emails, tweets and other short communications. The base assumption is that the sentiment is a fixed value: the email is either angry or happy; positive or negative. But in longer writings like a novel, we naturally expect the sentiment to vary over time. Can we apply sentiment analysis over the course of a long text, and thereby see the dramatic arc of a story as it flows from comedy to tragedy and maybe back again?
That's what Matthew Jockers has done with his R package "syuzhet". Inspired by the Russian formalist Vladimir Propp, the package analyzes the sentences of a novel and "reveals the emotional and affectual shifts that serve as proxies for the narrative movement between conflict and conflict resolution". Once that analysis is done, you can plot how the sentiment of the writing changes page by page. Here, for example is the chart for Oscar Wilde's Picture of Dorian Grey:
Matthew describes how the curve fits to the plot of the novel:
Young Stephen reaches a low point during and just after the sermon on hell which occurs midway through the narrative. Dorian’s life takes a dark turn as the reality of the portrait becomes apparent.
You can try the analysis yourself in R. All you need is the text of a novel as an R character vector (an easy way is to point the function get_text_as_string to a text file from Project Gutenburg.) The package vignette is a good place to start, and Lincoln Mullen provides several worked examples. Note that you do need Java installed to run the analysis functions.
For more on the methodology, read Matthew's blog post linked below.
Matthew L. Jockers: Revealing Sentiment and Plot Arcs with the Syuzhet Package