# Predict Blood Donation -warmup

August 28, 2016
By

Continuing from my previous post, in this post I will discuss on the inferential and predictive analysis.

About the dataset and the problem to solve: a brief

The dataset is derived from UCI Machine learning repository and the task is to predict if a donor has donated blood in March 2007 (1 stand for donating blood; 0 stands for not donating blood). There are 776 instances in 6 six variables and it is a classification problem.

A. Correlation

As a first measure, I check for strongly correlated predictors. The correlation between two variables is a number that indicates how closely their relationship follows a straight line. correlation refers to Pearson’s correlation coefficient. A correlation of 1, indicates perefct linear correlation. I notice that the predictor `total number of donations` and `total blood donated in c.c` are linearly correlated. There is a fairly strong negative linear association between number of donations and months since last donation (corr= -0.159). Next, to visualize the pairwise correlational matrix, I use the `pairs.panel()` from the `library(psych) ` which is shown in Fig 1

``` library(psych)
pairs.panels(train.data[c("Months.since.Last.Donation","Number.of.Donations","Total.Volume.Donated..c.c..","Months.since.First.Donation","Made.Donation.in.March.2007")]) ```

Fig 1: Correlational matrix

Filed under: classification techniques, inferential statistics, R Tagged: R

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)