Analysing comments to “Star Wars: The Last Jedi” – part 2

[This article was first published on Johannes Friedrich's R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The greatest teacher, failure is. –

As already mentioned in my first post I also analysed the user comments from a post at word by word. The figure shows the ‘wordcloud’ from all comments (1728 until now).


To create such a nice wordcloud, I used the following code. The first part was already explained in my first post.


site <- seq(0, 1710, 30)

url <- paste0("",site,"/#kommentare")

First I load all neccesarry packages and I create all available URLs to the comments.

comments <- lapply(1:length(url), function(x) {
  data <- read_html(url[x]) %>%
    html_nodes(xpath = '//*[@id="kommentargesamt"]') %>%
    html_nodes("#kommentar") %>%
    html_nodes("p") %>%
  data[seq(2, length(data), 2)]

This is the main part for scraping all the comments: I searched the HTML file for id=”kommentargesamt” and extract the comments. These are saved in the variable comments.

Now all is prepared for creating the wordcloud. For that purpose I used the following snippet, which I found once in the internet. There are many examples creating a wordcloud with R and I decided to use the following one:


words <- unlist(str_split(comments, pattern = c(" ")))

Corpus <- Corpus(VectorSource(words)) %>% 
  tm_map(content_transformer(tolower)) %>% 
  tm_map(removePunctuation) %>% 
  tm_map(removeWords, c("dass", "zuletzt", "geändert", "am", "uhr", 

To create a nice graphical output I recommend to save the wordcloud directly and not via RStudio viewer or something else.

    filename = "SWU_comments_wordcloud.png",
    width = 500,
    height = 500)

           scale = c(8,.2), 
           min.freq = 2, 
           max.words = 50,  
           random.order = FALSE, 
           rot.per = .15, 
           colors = brewer.pal(8,"Dark2"))

And that’s it !! I think most of the words are comprehensible also for non-german readers 😉

To leave a comment for the author, please follow the link and comment on their blog: Johannes Friedrich's R Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)