Analyze LinkedIn with R

Posted on March 18, 2015 by julianhi in R bloggers | 0 Comments

[This article was first published on ThinkToStart » R Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you have any questions to this tutorial or find some problems please feel free to create a topic in the forum:

http://thinktostart.com/forums/forum/questions-tutorials/analyze-linkedin-with-r/

Rplot

Some time ago I saw an interesting post in a R related group on LinkedIn. It was from Michael Piccirilli and he wrote something about his new package Rlinkedin. I was really impressed by his work and so I decided to write a blog post about it.

Get the package

The package is currently just available via GitHub. But thanks to devtools it is not a problem to install it.

require(devtools)

install_github("mpiccirilli/Rlinkedin")

require(Rlinkedin)

Authenticate with LinkedIn

In the next step we have to connect to the LinkedIn API via oAuth 2.0. You have two possibilities in the Rlinkedin package.

You can just use

in.auth <- inOAuth()

to use a default API key for getting LinkedIn data.

Or:

You can use your own application and application credentials to connect to the API.

Therefore you have to create an application on LinkedIn. Go on https://www.linkedin.com/secure/developer and log in with your LinkedIn account. Then click on “Add new Application”.

On the next page you can see app settings. Just set them as you can see on the following screenshots:

Then click on “Add application” and you get forwarded to your app´s credentials. Switch back to R and set the following variables:

app_name <- "XXX"
consumer_key <- "XXX"
consumer_secret <- "XXX"

Then you can authenticate with:

in.auth <- inOAuth(app_name, consumer_key, consumer_secret)

After a successful authentication you start to get some data.

Analyze LinkedIn with R

Michael created a nice overview of the different functions on the package´s GitHub page. So I will just show you here a small sample analysis.

First, lets download all your connections with:

my.connections <- getMyConnections(in.auth)

This creates a data frame with all available information about your connections. For our analysis we will get the column “industry” which is the industry the person is working in. We will use it to create a small word cloud.

text <- toString(my.connections$industry)

We will then transform this text with some standard word cloud to a nice looking industry overview:

clean.text <- function(some_txt)
{
  some_txt = gsub("(RT|via)((?:\b\W*@\w+)+)", "", some_txt)
  some_txt = gsub("@\w+", "", some_txt)
  some_txt = gsub("[[:punct:]]", "", some_txt)
  some_txt = gsub("[[:digit:]]", "", some_txt)
  some_txt = gsub("http\w+", "", some_txt)
  some_txt = gsub("[ t]{2,}", "", some_txt)
  some_txt = gsub("^\s+|\s+$", "", some_txt)
  some_txt = gsub("amp", "", some_txt)
  # define "tolower error handling" function
  try.tolower = function(x)
  {
    y = NA
    try_error = tryCatch(tolower(x), error=function(e) e)
    if (!inherits(try_error, "error"))
      y = tolower(x)
    return(y)
  }
  
  some_txt = sapply(some_txt, try.tolower)
  some_txt = some_txt[some_txt != ""]
  names(some_txt) = NULL
  return(some_txt)
}

clean_text = clean.text(text)
tweet_corpus = Corpus(VectorSource(clean_text))

tdm = TermDocumentMatrix(tweet_corpus, control = list(removePunctuation = TRUE,stopwords = stopwords("english"), removeNumbers = TRUE, tolower = TRUE))

#install.packages(c("wordcloud","tm"),repos="http://cran.r-project.org")

library(wordcloud)
m = as.matrix(tdm) #we define tdm as matrix
word_freqs = sort(rowSums(m), decreasing=TRUE) #now we get the word orders in decreasing order
dm = data.frame(word=names(word_freqs), freq=word_freqs) #we create our data set
wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Dark2")) #and we visualize our data

This will create a word cloud like the following:

Rplot

If you have any questions to this tutorial or find some problems please feel free to create a topic in the forum:

http://thinktostart.com/forums/forum/questions-tutorials/analyze-linkedin-with-r/

The post Analyze LinkedIn with R appeared first on ThinkToStart.

Related

To leave a comment for the author, please follow the link and comment on their blog: ThinkToStart » R Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Copyright © 2022 | MH Corporate basic by MH Themes