Analysing comments to “Star Wars: The Last Jedi” – part 2

[This article was first published on Johannes Friedrich's R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The greatest teacher, failure is. –

As already mentioned in my first post I also analysed the user comments from a post at www.starwars-union.de word by word. The figure shows the ‘wordcloud’ from all comments (1728 until now).

Wordcloud

To create such a nice wordcloud, I used the following code. The first part was already explained in my first post.

library(tidyverse)
library(rvest)

site <- seq(0, 1710, 30)

url <- paste0("https://www.starwars-union.de/nachrichten/18973/SWU-Kritiken-Unsere-Gedanken-zu-Star-Wars-Die-letzten-Jedi/k/",site,"/#kommentare")

First I load all neccesarry packages and I create all available URLs to the comments.

comments <- lapply(1:length(url), function(x) {
  
  data <- read_html(url[x]) %>%
    html_nodes(xpath = '//*[@id="kommentargesamt"]') %>%
    html_nodes("#kommentar") %>%
    html_nodes("p") %>%
    html_text()
  
  data[seq(2, length(data), 2)]
})

This is the main part for scraping all the comments: I searched the HTML file for id=”kommentargesamt” and extract the comments. These are saved in the variable comments.

Now all is prepared for creating the wordcloud. For that purpose I used the following snippet, which I found once in the internet. There are many examples creating a wordcloud with R and I decided to use the following one:

library(stringr)
library(tm)
library(SnowballC)
library(wordcloud)
library(RColorBrewer)

words <- unlist(str_split(comments, pattern = c(" ")))

Corpus <- Corpus(VectorSource(words)) %>% 
  tm_map(content_transformer(tolower)) %>% 
  tm_map(removePunctuation) %>% 
  tm_map(removeWords, c("dass", "zuletzt", "geändert", "am", "uhr", 
                        stopwords('german')))
                        

To create a nice graphical output I recommend to save the wordcloud directly and not via RStudio viewer or something else.

 png(
    filename = "SWU_comments_wordcloud.png",
    width = 500,
    height = 500)

 wordcloud(Corpus, 
           scale = c(8,.2), 
           min.freq = 2, 
           max.words = 50,  
           random.order = FALSE, 
           rot.per = .15, 
           colors = brewer.pal(8,"Dark2"))
 
 dev.off()
 

And that’s it !! I think most of the words are comprehensible also for non-german readers 😉

To leave a comment for the author, please follow the link and comment on their blog: Johannes Friedrich's R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)