Sentiment analysis on my girlfriend’s text messages

September 11, 2015

(This article was first published on R –, and kindly contributed to R-bloggers)

When I told my friends that I wanted to give my girlfriend an infographic of us (centered around a sentimental analysis of our texts) as a gift for our first anniversary, most of them told me that was a terrible idea. Yeah… well… CHALLENGE ACCEPTED!! Without further ado, this is what love looks like:


What… um…. what are we looking at?

This is a plot of the aggregate sentiment per day per person of our skype and whatsapp messages. We don’t sms and email was:

a) annoying to extract [does anyone know how  an easy way to get gmail into R?], and

b) one email consists of lots of words, few times per day… as opposed to many short messages per day, therefore I didn’t think it was right to mix with text messages.

The sentiment was evaluated by comparing to positive and negative word lists. In other words: When I say that “we are getting sweeter”, I can actually prove it (within a reasonable margin of error)*.

How come she didn’t break up with me on the spot?

OK ok ok…. I didn’t just give her this, I made this part of a whole infographic  about our year together with wordclouds and other stuff that made it lighter. I also made it personal by calling out specific individual days from that plot and displaying the text within to add context to each point. So that’s when you realize each point is actually a day from our lives, some good, some bad… it’s like a memory thing. Especially true because those few I called out were special days in which we texted especially nice things. It was sweet… I guess u had to be there. Also, the first year anniversary is paper, so it’s perfectly apropos. Also2, I spent a TON of time on this. Chicks dig it when you spend effort on them. Also3, she’s dating a geek, she expects this kinds of thing from me and loves it.

How did I do it?

I typically post my analyses to github, but I won’t this time for obvious reasons… here’s the general flow of how I did it:

  1. Find someone to love. Write lots of text messages to each other for a year.
  2. Get the logs. I used whatsapp (email yourself the whole log, it’s in the settings) and skype (at the time I did the analysis you could get up to 6 months of history. Just copy paste it into a text file)
  3. Clean the logs. This part is super annoying. Every time we were texting and anyone pasted in something from somewhere else (like a link or copypasting from another conversation), it breaks the line-number scheme. There might be better ways to clean it, but for me it was a bit of regex, a LOT of manual cleaning and iterating. There are also a lot of encoding problems if your logs are in more than one language, and lastly, not all emoji translates to text nicely.  If they didn’t, I just deleted them… which sucks (This is kind of a big deal since there’s a lot of sentiment in emojis :( :'(. Somebody should come up w/ a emoji sentiment valence table for whatsapp). What you want in the end is a text file that has 3 columns: Timestamp, name, clean text, seperated by a unique delimiter, for example “|”. Keep munging till you have that.
  4. Read Whatsapp log into R
    1. Realize that the logs within the  current year don’t have the year in the timestamp, so add it manually.
  5. Read in Skype logs and combine w/ the Whatsapp
  6. Realize that the timestamps are different between the logs. Pull your hair out remembering how to deal w/ dates in R and munge and munge till they are the same. (no, I’m not going to learn to use lubridate, I’m not a quitter).
  7. Done! Now start analyzing!
  8. Sentiment analysis- compare each individual text message against the sentimental Lexicon from Hu and Liu.
    1. PROTIP – for easy mode, use the score_sentiment function from Jeffrey Breen.
    2. Cap positive sentiment greater than 4 to 4… that’s good enough. I guess you could cap negative sentiment as well, but I didn’t have the need.
    3. To create the chart above, aggregate the sentiment scores PER PERSON PER DAY. Now you have two sets of dots, one for you and one for your lover. Plot those bad boys and add the smoothing!
    4. Now that you have the sentiment analysis of each text message, other fun things you can do (not shown here, but you’ll get the picture):
      1. When and how do we communicate? and at what time of day are we sweetest and least-sweet
      2. Are we sweeter on Skype or Whatsap?
      3. Sentiment-sensitive wordclouds, etc

*OK fine, but what does it mean?

OK fine, it don’t mean a goddam thing but it is interesting to analyze anyway! The rise in sweetness halfway through the year is due to the fact that we were apart, and were forced to be sweet by text and calls more than in person. Interesting that the texts stayed sweet after that. Also interesting that the amount of communication since then really increased.

With regards to the future…OF COURSE in a normal relationship, text messages start off like this:

“I’m thinking of you, sleep with the angels sweet one”

and end up like this:

“Did you forget the milk?!”

That’s just what happens in relationships, because we’ve all got stuff to do and when you share your life with someone, you become part of a team, and from time to time, the team needs milk and sometimes that milk is forgotten for extremely valid and completely unavoidable reasons. So eventually we will get less sweet via text message and what will that mean? Probably nothing at all. Anyway, what do I care? At least we’re getting sweeter now. :) I’ll worry bout tomorrow tomorrow.

Joking aside, if nothing else, doing analyses like this force people like me to TRY EXTRA HARD to be sweet even if it’s not necessary. And intention when text-messaging is important since there’s NEVER any context to text messages and misunderstandings are common.

So keep up the sweet text-messages, geeks of the world! Don’t want the trendline to go negative, do we?

Edited by Laure Belotti


UPDATE: For more “Love data”, check out other people that analyzed their partners’ text messages (HERE and HERE and HERE), two people that hacked online dating for their own purposes (HERE and HERE), and of course, the motherload of Love-data: Enjoy!


To leave a comment for the author, please follow the link and comment on their blog: R – offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)