Analyze LinkedIn with R

March 18, 2015
By

(This article was first published on ThinkToStart » R Tutorials, and kindly contributed to R-bloggers)


 If you have any questions to this tutorial or find some problems please feel free to create a topic in the forum:

http://thinktostart.com/forums/forum/questions-tutorials/analyze-linkedin-with-r/

Rplot

Some time ago I saw an interesting post in a R related group on LinkedIn. It was from Michael Piccirilli and he wrote something about his new package Rlinkedin. I was really impressed by his work and so I decided to write a blog post about it.

Get the package

The package is currently just available via GitHub. But thanks to devtools it is not a problem to install it.

require(devtools)

install_github("mpiccirilli/Rlinkedin")

require(Rlinkedin)

Authenticate with LinkedIn

In the next step we have to connect to the LinkedIn API via oAuth 2.0. You have two possibilities in the Rlinkedin package.

You can just use

in.auth <- inOAuth()

to use a default API key for getting LinkedIn data.

Or:

You can use your own application and application credentials to connect to the API.

Therefore you have to create an application on  LinkedIn. Go on https://www.linkedin.com/secure/developer and log in with your LinkedIn account. Then click on “Add new Application”.

add_new

On the next page you can see app settings. Just set them as you can see on the following screenshots:

app_setting1 app_setting2

 

Then click on “Add application” and you get forwarded to your app´s credentials. Switch back to R and set the following variables:

app_name <- "XXX"
consumer_key <- "XXX"
consumer_secret <- "XXX"

Then you can authenticate with:

in.auth <- inOAuth(app_name, consumer_key, consumer_secret)

After a successful authentication you start to get some data.

 

Analyze LinkedIn with R

Michael created a nice overview of the different functions on the package´s GitHub page. So I will just show you here a small sample analysis.

First, lets download all your connections with:

my.connections <- getMyConnections(in.auth)

This creates a data frame with all available information about your connections. For our analysis we will get the column “industry” which is the industry the person is working in. We will use it to create a small word cloud.

text <- toString(my.connections$industry)

We will then transform this text with some standard word cloud to a nice looking industry overview:

clean.text <- function(some_txt)
{
  some_txt = gsub("(RT|via)((?:\b\W*@\w+)+)", "", some_txt)
  some_txt = gsub("@\w+", "", some_txt)
  some_txt = gsub("[[:punct:]]", "", some_txt)
  some_txt = gsub("[[:digit:]]", "", some_txt)
  some_txt = gsub("http\w+", "", some_txt)
  some_txt = gsub("[ t]{2,}", "", some_txt)
  some_txt = gsub("^\s+|\s+$", "", some_txt)
  some_txt = gsub("amp", "", some_txt)
  # define "tolower error handling" function
  try.tolower = function(x)
  {
    y = NA
    try_error = tryCatch(tolower(x), error=function(e) e)
    if (!inherits(try_error, "error"))
      y = tolower(x)
    return(y)
  }
  
  some_txt = sapply(some_txt, try.tolower)
  some_txt = some_txt[some_txt != ""]
  names(some_txt) = NULL
  return(some_txt)
}

clean_text = clean.text(text)
tweet_corpus = Corpus(VectorSource(clean_text))

tdm = TermDocumentMatrix(tweet_corpus, control = list(removePunctuation = TRUE,stopwords = stopwords("english"), removeNumbers = TRUE, tolower = TRUE))

#install.packages(c("wordcloud","tm"),repos="http://cran.r-project.org")

library(wordcloud)
m = as.matrix(tdm) #we define tdm as matrix
word_freqs = sort(rowSums(m), decreasing=TRUE) #now we get the word orders in decreasing order
dm = data.frame(word=names(word_freqs), freq=word_freqs) #we create our data set
wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Dark2")) #and we visualize our data

This will create a word cloud like the following:

Rplot

 If you have any questions to this tutorial or find some problems please feel free to create a topic in the forum:

http://thinktostart.com/forums/forum/questions-tutorials/analyze-linkedin-with-r/

 

The post Analyze LinkedIn with R appeared first on ThinkToStart.

To leave a comment for the author, please follow the link and comment on their blog: ThinkToStart » R Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)