Gender Analysis of Facebook Post Likes

September 27, 2014
By

(This article was first published on ThinkToStart » R Tutorials, and kindly contributed to R-bloggers)

Facebook Page Analyzer
A lot of people showed a huge interest in analyzing Facebook data with R. So I decided to write some more tutorials about the possibilities you have with Rfacebook package created by Pablo Barbera.
This tutorial will be about plotting the gender distribution of the likes of Facebook page posts. The Rfacebook package does not include a direct function for this problem, but with the combination of a few different functions it is possible.
If you just want to try the function take a look at the first BETA of my Facebook Page Analyzer tool which includes the method described in this tutorial: https://thinktostart.shinyapps.io/FB_page_analyzer/

Authentication:

Like every time we need to go through the authentication process. You can find the steps to do so in the first part of this tutorial:
http://thinktostart.com/analyzing-facebook-with-r/

Gender Analysis of Facebook Post Likes

First we have to load the Rfacebook package
install.packages(„Rfacebook“)
require(Rfacebook)
When you follow the steps in the tutorial I mentioned above you got your authentication token.
token <- „XXX"
Then we have to define the number of posts of the page we want to analyze. These are always the most recent ones. But they can have a lot of likes and as we have to process different actions on every single like it can last a lot of time if you choose the number of posts too big.
number_posts <- 2
The last variable we have to define is, of course, the name of the page we want to analyze.
page_name <- „forbes"
In the next step we download the comments from the page with:
page <- getPage(page_name, token, n = number_posts, feed = FALSE)
This returns a data frame with the number of posts we requested if it is available. The posts have following attributes:
from_id, from_name, message, created_time, type, link, id, likes_count, comments_count, shares_count
For our analysis we just need the column id which contains a unique identifier for every post, also called the post id.
posts <- page$id

Get Post Like details

In the next steps there are happening basically two processes. First we create a new entry in our final data frame from the post we are analyzing at the moment. And then we use its id to get more insights to this post with the getPost() function.
The returned data frame basically contains 3 values: post, likes, comments.
These categories contain several lists with even more data, but we just need the data stored in the „likes“ section. There we can find the fields from_name and from_id for every single like of the post.
So we extract the user_id which is the field from_id and get the user insights with the getUsers() function. From the returning user data we extract the gender and save it to a temporary gender_frame.
After we processed all likes of the post and stored the gender of every single like in the gender_frame we divide it in 3 categories: male, female and etc. So we count how many people said they are „male“, „female“ or something different.
We then save the results in our data_frame_gender and process the next posts in the same way.
for(i in 1:length(posts))
    {
      temp <- posts[i]
      #dataframe values:
      #post id
      #likes count
      #
      post <- getPost(temp,token)
     
      data_frame_gender[i,1] <- post$post$message
      data_frame_gender[i,5] <- post$post$likes
      data_frame_gender[i,6] <- post$post$type
     
      gender_frame <- data.frame(gender=character(),stringsAsFactors=FALSE)
     
      for(j in 1:length(post$likes$from_id))
      {
        likes <- post$likes$from_id
        user_id <- likes[j]
       
        user <- getUsers(user_id,token=token)
       
        gender <- user$gender
       
        gender_frame[nrow(gender_frame)+1,] <- gender
       
      }
     
      number_males <- nrow(subset(gender_frame, gender=="male"))
      number_females <- nrow(subset(gender_frame, gender=="female"))
      number_etc <- data_frame_gender[i,5] - (number_males+number_females)
     
      data_frame_gender[i,2] <- number_males
      data_frame_gender[i,3] <- number_females
      data_frame_gender[i,4] <- number_etc
     
    }

Plot the data

The plotting can be done really fast.
We define the slices of our pie chart and add the names to them.
slices <- c(sum(data()$male),sum(data()$female),sum(data()$etc))
   
pct <- round(slices/sum(slices)*100)
    lbls <- names(data_frame_gender[2:4])
    lbls <- paste(lbls, pct) # add percents to labels
    lbls <- paste(lbls,"%",sep="") # ad % to labels
   
    pie(slices, labels = lbls, main="Gender Distribution of all analyzed posts")

You can find the whole code on my github account:

https://github.com/JulianHill/R-Tutorials/blob/master/r_facebook_gender.r 

The post Gender Analysis of Facebook Post Likes appeared first on ThinkToStart.

To leave a comment for the author, please follow the link and comment on their blog: ThinkToStart » R Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)