Here you will find daily news and tutorials about R, contributed by over 750 bloggers.
There are many ways to follow us - By e-mail:On Facebook: If you are an R blogger yourself you are invited to add your own R content feed to this site (Non-English R bloggers should add themselves- here)

Continuing from my previous post, in this post I will discuss on the inferential and predictive analysis.

About the dataset and the problem to solve: a brief

The dataset is derived from UCI Machine learning repository and the task is to predict if a donor has donated blood in March 2007 (1 stand for donating blood; 0 stands for not donating blood). There are 776 instances in 6 six variables and it is a classification problem.

A. Correlation

As a first measure, I check for strongly correlated predictors. The correlation between two variables is a number that indicates how closely their relationship follows a straight line. correlation refers to Pearson’s correlation coefficient. A correlation of 1, indicates perefct linear correlation. I notice that the predictor total number of donations and total blood donated in c.c are linearly correlated. There is a fairly strong negative linear association between number of donations and months since last donation (corr= -0.159). Next, to visualize the pairwise correlational matrix, I use the pairs.panel() from the library(psych) which is shown in Fig 1