Sermon Sentiment Analysis
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Matt Chandler vs. Mark Driscoll
I came across an interesting API from Viral Heat which is capable of “Sentiment Analysis.” This analysis is designed to capture the sentiment of a statement by ranking it on a scale from -1 to 1. For instance, a chipper sentence like “The smell of roses makes me giddy!” is rated a solid 0.82 (a very positive score), while a downer such as “She was distraught by the genocide” gets rated a very negative -0.91. Of course, there are many blips which get rated incorrectly (at least in my mind), but there seems to be some truth underneath the noise. This same sentiment analysis engine was used by OpenBible.info to map out the sentiment of the entire Bible.
I was intrigued by the idea and decided to play with it a bit myself. I sought out some repositories of transcripts (speeches, scripts, etc.) but after a bit of searching, I wasn’t able to find a repository of speeches by single speakers; I settled on using sermon transcripts — at least for some initial experimentation. I grabbed a couple dozen transcripts from Matt Chandler of The Village Church and Mark Driscoll of Mars Hill Church and got to work. I ended up downloading 19 sermons from Matt covering multiple years from 2004 – 2010 and 7 of Mark’s recent sermons (all 2011).
First I was interested in looking at individual sermons to see the pattern or trends in sentiment over the course of a single sermon. I then looked at which words were most likely to be found in positive-sentiment sentences and which were likely to be found in negative ones. Here are a couple of examples:
Next, I investigated which words were used most commonly by each pastor (after removing common words like “it,”me,” etc. I observed the following list, showing the word, and the average percentage of the time that word is used:
Commonly Used By Both
- Jesus(3.2%)
- God (2.6%)
- People (1.5%)
- Love (0.87%)
- Life (0.82%)
- Bible (0.72%)
- Church (0.64%)
- Time (0.57%)
- World (0.51%)
- Look (0.49%)
- Tell (0.45%)
- Day (0.42%)
Frequently Used by Matt
- Little (0.64%)
- Christ (0.60%)
- Pray (0.50%)
- Heart (0.44%)
- Lot (0.38%)
- Verse (0.36%)
- Week (0.36%)
- Maybe (0.35%)
Frequently Used by Mark*
- Peter (0.81%)
- Luke (0.71%)
- Judas (0.69%)
- Serve (0.54%)
- Sin (0.51%)
- Book (0.47%)
- Times (0.47%)
- Temple (0.46%)
*I’m guessing many of these were present only because he was recently in a series on Luke. A bigger sample may change these numbers significantly.
We can also look at how the sentiment of an average sermon for each speaker progresses by aggregating the trends across all of the sermons covered in this analysis. Below are the results with the average trend highlighted in blue.
You may notice that Mark’s trendline seems to be higher than Matt’s, implying that, on average, Mark has a more positive sentiment throughout his sermon. Indeed, it appears that there is a statistically significant difference (p=0.027, excluding the selection bias) between the two speakers’ sentiments, with Mark’s average sermon having a serntiment of 0.197 and Matt having one of 0.096, as shown below.
Feel free to comment if you have any thoughts/critiques or any ideas for further analysis!
Edit: I added my preliminary code on GitHub: https://github.com/trestletech/Sermon-Sentiment-Analysis
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.