How to do Topic Extraction from Customer Reviews in R

[This article was first published on r-bloggers on Programming with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Topic Extraction is an integral part of IE (Information Extraction) from Corpus of Text to understand what are all the key things the corpus is talking about. While this can be achieved naively using unigrams and bigrams, a more intelligent way of doing it with an algorithm called RAKE is what we’re going to see in this post.

Udpipe

udpipe is an NLP-focused R package created and opensourced by this organization bnosac. Thanks to them, udpipe is the R package that many a times solves the pain of not having native spacy for R.

Udpipe – Installation

install.packages("udpipe")

Udpipe – Loading

library("udpipe")

Udpipe – Language Model

An NLP library is as good as its Language Model because the Language model contains the recipe of how to annotate your text corpus. So, before we proceed further, we need to download the language model for us to use. In this case, We’ll download English Language model as we’re going to do Topic Extraction for English Reviews (Text).

en <- udpipe::udpipe_download_model("english")

Language model, once downloaded can be used later on without requiring to be redownloaded for every session.

Customer Reviews - Extraction

We’ll use itunesr package to extract reviews of Amazon US App from Apple App Store.

library(itunesr)

reviews1 <- getReviews("297606951", "us", 1)

reviews2 <- getReviews("297606951", "us", 2)

reviews <- rbind(reviews1, reviews2)

head(reviews)
##                                    Title
## 1      Fine Anything Easy, Good Policies
## 2                       Customer support
## 3 Uh oh, something went wrong on our end
## 4                        Connection Lost
## 5             Add this app to the I-Pads
## 6                            Wish lists!
##                                        Author_URL           Author_Name
## 1 https://itunes.apple.com/us/reviews/id899889795    KeithAppProgrammer
## 2 https://itunes.apple.com/us/reviews/id978296731             Stormdoll
## 3  https://itunes.apple.com/us/reviews/id33953389             Joker1138
## 4   https://itunes.apple.com/us/reviews/id8865955       Loquacious lair
## 5  https://itunes.apple.com/us/reviews/id43459956               MattC4U
## 6 https://itunes.apple.com/us/reviews/id389452759 Best update ever12345
##   App_Version Rating
## 1     13.15.0      5
## 2     13.15.0      5
## 3     13.15.0      1
## 4     13.15.0      2
## 5     13.15.0      1
## 6     13.15.0      1
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Review
## 1                                                                                                                                                                                                                                                                                                                       We’ve been quite blessed to work with Amazon. Searching for odd items, the App also has some compatibility safeguards. If I need to return something, it really couldn’t be easier.
## 2 I love not having to call if there is an issue. The mobile app has great automated features to reach someone and when there is a problem it’s resolved quickly and in the manner I request instead of just a refund . - meaning I was able to get half of my order refunded and the other half mailed again as my first package was listed lost. The items I needed more quickly than could arrive were swiftly refunded and the other items mailed again without a problem this time - super convenient!
## 3                                                                                                                                                                                                                                                                                                                                                               Constantly getting the above error message combined with random pictures of dogs. Hasn’t been fixed for a couple weeks. Pretty frustrating.
## 4                                                                                                                                                                                                                                                                                                                                                                       The app is constantly crashing and telling me that the network connection has been lost even if I have full access to WiFi or data.
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     This makes me so mad.
## 6                                                                                                                                                                                                                                     What did you do Amazon? Changing the way we saved wish list items was a horrible idea. Whoever came up with this heart update instead of holding and dropping needs to be demoted immediately. Please fix this. We also need Amazon smile ability in the app as well.
##                  Date
## 1 2019-08-21 13:54:37
## 2 2019-08-21 11:39:40
## 3 2019-08-21 10:21:20
## 4 2019-08-21 07:11:33
## 5 2019-08-21 05:25:44
## 6 2019-08-21 05:20:25

At this point, We’ve about 98 Reviews (Text) of Amazon iOS App from US Apple Store.

Customer Reviews - Only Negative (1 & 2-star)

We’ll pick only the negative reviews (1 & 2-star) to understand what pain points are customers talking about while rating Amazon bad.

reviews_neg <- reviews[reviews$Rating %in% c('1','2'),]

nrow(reviews_neg)
## [1] 68

Customer Reviews - Annotation

We’re going to do Topic Extraction from the above extracted 70 Reviews. But before we can proceed with Topic Analysis, We need to annotate the text with the language model that we downloaded above.

model <- udpipe_load_model("english-ewt-ud-2.3-181115.udpipe")

doc <- udpipe::udpipe_annotate(model, reviews_neg$Review)

Let’s look at the object doc to see what’s there in it.

names(as.data.frame(doc))
##  [1] "doc_id"        "paragraph_id"  "sentence_id"   "sentence"     
##  [5] "token_id"      "token"         "lemma"         "upos"         
##  [9] "xpos"          "feats"         "head_token_id" "dep_rel"      
## [13] "deps"          "misc"

Considering the scope of this post is Topic Analysis, I’ll leave out the basics of NLP (to understand the above terms, if you’re not familiar) for another post.

Topic Extraction using RAKE

RAKE stands for Rapid Automatic Keyword Extraction. Please check out the documentation for more understanding of the algorithm behind the function keyword_rake() which we’ll use to perform Topic Extraction.

doc_df <- as.data.frame(doc)

topics <- keywords_rake(x = doc_df, term = "lemma", group = "doc_id", 
                          relevant = doc_df$upos %in% c("NOUN", "ADJ"))

head(topics)
##         keyword ngram freq     rake
## 1 error message     2    2 2.375000
## 2    new layout     2    2 2.000000
## 3 promo pricing     2    2 2.000000
## 4 latest update     2    2 1.857143
## 5      same app     2    2 1.674242
## 6 multiple item     2    3 1.666667

Voila! Topics (or as technically it goes, Keywords) have been extracted using RAKE. As the output above states, we also get to see few metrics like ngram, freq and rake score against those Topics.

Topic Analysis

Let’s load up tidyverse to kickstart our Analysis

library(tidyverse)

and make a bar chart of the top 10 topics based on the rake score.

topics %>% 
  head() %>% 
  ggplot() + geom_bar(aes(x = keyword,
                          y = rake), stat = "identity",
                      fill = "#ff2211") +
  
  theme_minimal() +
  labs(title = "Top Topics of Negative Customer Reviews",
       subtitle = "Amazon US iOS App",
       caption = "Apple App Store")

That’s a nice plot indicating the top customer pain points. Seems the latest update and its error messages didn’t go well with the Customers. This is a simple bar plot but the output of RAKE could also be used to make a correlation plot between rake score and freq to add extra dimension in understanding More frequently occuring topics.

Summary

udpipe is a very handy package if you are in the business of NLP and Text Analytics. It also supports multiple other Languages like German, French other than English.

References:

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Programming with R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)