Analysing comments to “Star Wars: The Last Jedi” – part 2

December 20, 2017

(This article was first published on Johannes Friedrich's R Blog, and kindly contributed to R-bloggers)

The greatest teacher, failure is. –

As already mentioned in my first post I also analysed the user comments from a post at word by word. The figure shows the ‘wordcloud’ from all comments (1728 until now).


To create such a nice wordcloud, I used the following code. The first part was already explained in my first post.


site <- seq(0, 1710, 30)

url <- paste0("",site,"/#kommentare")

First I load all neccesarry packages and I create all available URLs to the comments.

comments <- lapply(1:length(url), function(x) {
  data <- read_html(url[x]) %>%
    html_nodes(xpath = '//*[@id="kommentargesamt"]') %>%
    html_nodes("#kommentar") %>%
    html_nodes("p") %>%
  data[seq(2, length(data), 2)]

This is the main part for scraping all the comments: I searched the HTML file for id=”kommentargesamt” and extract the comments. These are saved in the variable comments.

Now all is prepared for creating the wordcloud. For that purpose I used the following snippet, which I found once in the internet. There are many examples creating a wordcloud with R and I decided to use the following one:


words <- unlist(str_split(comments, pattern = c(" ")))

Corpus <- Corpus(VectorSource(words)) %>% 
  tm_map(content_transformer(tolower)) %>% 
  tm_map(removePunctuation) %>% 
  tm_map(removeWords, c("dass", "zuletzt", "geändert", "am", "uhr", 

To create a nice graphical output I recommend to save the wordcloud directly and not via RStudio viewer or something else.

    filename = "SWU_comments_wordcloud.png",
    width = 500,
    height = 500)

           scale = c(8,.2), 
           min.freq = 2, 
           max.words = 50,  
           random.order = FALSE, 
           rot.per = .15, 
           colors = brewer.pal(8,"Dark2"))

And that’s it !! I think most of the words are comprehensible also for non-german readers 😉

To leave a comment for the author, please follow the link and comment on their blog: Johannes Friedrich's R Blog. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)