Combining the power of R and Python with reticulate
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R + Py
In the word of R vs Python fights, This is a simple (could be called, naive as well) attempt to show how we can combine the power of Python with R and create a new superpower.
Like this one, If you have watched The Incredibles before!
About this Dataset
This dataset contains a bunch of tweet that came with this tag #JustDoIt after Nike released the ad campaign with Colin Kaepernick that turned controversial.
Dataset source: https://www.kaggle.com/eliasdabbas/5000-justdoit-tweets-dataset
Superstar – Reticulate
The superstar who’s making this possible is the R package reticulate by RStudio.
Let us start with the code!!
The R Code
#loading required R libraries
library(tidyverse)
library(ggthemes)
library(knitr)
tweets <- read_csv("https://raw.githubusercontent.com/amrrs/python_plus_r_brug/master/justdoit_tweets_2018_09_07_2.csv")
text <- tweets$tweet_full_text
set.seed(123)
text_10 <- text[sample(1:nrow(tweets),100)]
The Python Code
import spacy
import pandas as pd
nlp = spacy.load('en_core_web_sm')
doc = nlp(str(r.text_10))
pos_df = pd.DataFrame(columns = ["text","pos","lemma"])
for token in doc:
df1 = pd.DataFrame({"text" : token.text, "pos" : token.pos_, "lemma" : token.lemma_}, index = [0])
#print(token.text, token.pos_)
#print(df1)
pos_df = pd.concat([pos_df,df1])
#print(pos_df)
Now, Again The R Code
#data.frame(token = as.vector(py$tokens)) %>% count(token) %>% arrange(desc(n))
py$pos_df %>%
count(pos) %>%
ggplot() + geom_bar(aes(pos,n), stat = "identity") +
coord_flip() +
theme_minimal() +
labs(title = "POS Tagging",
subtitle = "NLP using Python space - Graphics using R ggplot2")

Now, Again The Python Code
ent_df = pd.DataFrame(columns = ["text","label"])
for ent in doc.ents:
df1 = pd.DataFrame({"text" : ent.text, "label" : ent.label_}, index = [0])
#print(token.text, token.pos_)
#print(df1)
ent_df = pd.concat([ent_df,df1])
One Final Time, The R Code
py$ent_df %>%
count(label) %>%
ggplot() + geom_bar(aes(label,n), stat = "identity") +
coord_flip() +
#theme_solarized() +
theme_fivethirtyeight() +
labs(title = "Entity Recognition",
subtitle = "NLP using Python space - Graphics using R ggplot2")

Summary
Thus, In this post we learnt how to combine the best of R and Python - in this case - R for Data Analysis and Data Visualization - Python for Natural Languge Processing with Spacy.
If you liked this, Please subscribe to my Language-agnostic Data Science Newsletter and also share it with your friends!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.