Sentiment Analysis in R with Custom Lexicon Dictionary using tidytext

Posted on October 7, 2020 by AbdulMajedRaja RS in R bloggers | 0 Comments

[This article was first published on r-bloggers on Programming with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this Sentiment Analysis tutorial, You’ll learn how to use your custom lexicon (for any language other than English) or keywords dictionary to perform simple (slightly naive) sentiment analysis using R’s tidytext package. Note: This isn’t going to provide you the same accuracy as using the language model, but it’s going to get you to the fastest solution (with some accuracy tradeoff). This example deals with Turkish Sentiment Analysis Script. Please note this tutorial doesn’t include Text Pre-processing steps, but those are very important for any Text Analytics / NLP project.

Video Walkthrough

Steps

Read the Input Text as a Dataframe
Load the lexicon / new language dictionary
Select the appropriate columns – in this case, word and polarity
Join the tokenized words from the text dataframe with the lexicon dataframe
Roll-up the result dataframe based on the grouping variable (row_number) to get sentence level aggregated sentiment score

Code

library(tidyverse)

#install.packages("tidytext")
library(tidytext)

sent <- read.csv('text.csv')

lexicon <- read.table("turkish_lexicon.csv",
                      header = TRUE,
                      sep = ';',
                      stringsAsFactors = FALSE)

lexicon2 <- lexicon %>% 
  select(c("WORD","POLARITY")) %>% 
  rename('word'="WORD",'value'="POLARITY")


sent %>%
  mutate(linenumber = row_number()) %>% #line number for later sentence grouping 
  unnest_tokens(word, tweettext) %>% #tokenization - sentence to words
  inner_join(lexicon2) %>% # inner join with our lexicon to get the polarity score
  group_by(linenumber) %>% #group by for sentence polarity
  summarise(sentiment = sum(value)) %>% # final sentence polarity from words
  left_join(
  sent %>%
  mutate(linenumber = row_number()) #get the actual text next to the sentiment value
) %>% write.csv("sentiment_output.csv",row.names = FALSE)

References

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Programming with R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Sentiment Analysis in R with Custom Lexicon Dictionary using tidytext

Video Walkthrough

Steps

Code

References

Related

Video Walkthrough

Steps

Code

References

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)