How to Visualize Multivariate Data Analysis

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this tutorial, we will work with the factoextra R package and we will consider the Country dataset. Let’s start:

library(factoextra)

df<-read.csv("DataCountries.txt", sep="\t")

head(df)
How to Visualize Multivariate Data Analysis 5
How to Visualize Multivariate Data Analysis 6

PCA Analysis

Now we will run a PCA analysis on our dataset. Note that we need to include only the numeric variables. We will also set as row names the column Country.

# set as rownames the column Country
rownames(df)<-df$Country

# remove the Countrly columns
df$Country<-NULL

# run a PCA Analysis
dfPCA <- prcomp(df, center = TRUE, scale. = TRUE) 

Let’s get Scree plot which shows the percentage of explained variance by Principal Component.

fviz_eig (dfPCA)
How to Visualize Multivariate Data Analysis 7

Graph of Individual

Let’s plot all the countries into two dimensions by taking into consideration the quality of the individuals on the factor map.

# cos2 = the quality of the individuals on the factor map
# Select and visualize some individuals (ind) with select.ind argument.
 # - ind with cos2 >= 0.96: select.ind = list(cos2 = 0.96)
 # - Top 20 ind according to the cos2: select.ind = list(cos2 = 20)
 # - Top 20 contributing individuals: select.ind = list(contrib = 20)
 # - Select ind by names: select.ind = list(name = c("23", "42", "119") )

fviz_pca_ind(dfPCA, col.ind = "cos2" , repel = TRUE)    
How to Visualize Multivariate Data Analysis 8

Graph of Variables

Let’s see how we can represent the variables into two dimensions by taking into account their contribution.

#  select.var = list(contrib = 15)

fviz_pca_var(dfPCA, col.var = "contrib", repel = TRUE)
How to Visualize Multivariate Data Analysis 9

Graph of the Biplot

# Graph of the Biplot
fviz_pca_biplot(dfPCA, repel = TRUE)
How to Visualize Multivariate Data Analysis 10

Eigenvalues, Variables and Individuals

Let’s see how we can get the Eigenvalues and statistics for Variables and Individuals such as the Coordinates, the Contributions to the PCs and the Quality of representation

Eigenvalues

# Eigenvalues
eigens_vals <- get_eigenvalue(dfPCA)
eigens_vals 
How to Visualize Multivariate Data Analysis 11

Variables

# By Variable
by_var <- get_pca_var(dfPCA)
by_var$coord         
by_var$contrib        
by_var$cos2    
How to Visualize Multivariate Data Analysis 12

Individuals

# By ndividual
by_ind <- get_pca_ind(dfPCA)
by_ind$coord         
by_ind$contrib        
by_ind$cos2  

To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)