**R-exercises**, and kindly contributed to R-bloggers)

We are committed to bringing you 100% authentic exercise sets. We even try to include as different datasets as possible to give you an understanding of different problems. No more classifying Titanic dataset. R has tons of datasets in its library. This is to encourage you to try as many datasets as possible. We will grasp the basics of gauging accuracy of the model.

It will be helpful to go over Tom Fawcett’s research paper on ‘An introduction to ROC analysis’

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

**Exercise 1**

In pattern recognition and classification, we use precision(fraction of retrieved instances that are relevant) and recall(fraction of relevant instances that are retrieved). In other words precision helps us understand whether there is overlap in the model and whether the model can distinguish between the items. Recall,also called sensitivity, helps us gauge how many positives we can identify out of all the positives.

Suppose we design a model to identify iphones from a video that also contain android phones. If the program identifies 5 iphones in a scene containing 7 iphones and some android phones. if 3 of the identification are correct but 2 are actually android phones.

a. What is the precision of the model?

b. What is the recall of the model?

**Exercise 2**

Suppose we created the model and want to assess how well it predicted. We will compare the actual value from the Test set to the predicted value that is derived from the model.

FALSE TRUE

0 94 23

1 24 100

Using the confusion matrix above, answer the following questions

a. Number of true positives?

b. Number of false negatives?

c. Number of true negatives?

d. Number of false positives?

**Exercise 3**

Quick way to gauge the accuracy of the model, assuming that the model target class is balanced, is to use the formula (TN+TP)/N. What is the accuracy of the model?

**Exercise 4**

Quick way to gauge the error rate is to use the formula (FP/FN)/N. What is the overall error rate?

**Exercise 5**

Sensitivity is defined as TP/(TP+FN). Specificity is defined as TN/(TN+FP). Using that information, answer the following questions

a.What is the sensitivity of the model?

b.What is the specificity of the model?

**Exercise 6**

There are usually two type of errors, false positive and false negative. False negative error rate is defined as FN/(TP+FN). False positive error rate is defined as FP/(TN+FP). Now answer the following questions:

a. What is the False negative error?

b. What is the False positive error?

**Exercise 7**

Now suppose this is a model like

that was used to identify iphones from androids in a video(Q1).

a. What is the precision?

b. What is the recall? Hint: Recall is the same as sensitivity.

**Exercise 8**

Go ahead and run the code below. You may run line by line to see what is happening. In brief, we are loading the housing data, changing the Cont variable to binary class, splitting the data into Train and Test set, Building a model, predicting and comparing actual Cont value in the Test set with predicted value in pred.

attach(housing)

housing$Cont=ifelse(housing$Cont==”High”,1,0)

spl=sample.split(housing$Cont,SplitRatio = 0.7)

Train=housing[spl==TRUE,]

Test=housing[spl==FALSE,]

model 0.5)

**Exercise 9**

Using the confusion matrix output from your last code, answer the following questions.

a) What is the Precision?

b) What is the recall?

**Exercise 10**

Using the confusion matrix, answer the follwing questions

c) What is the accuracy of the model?

d) What is the overall error rate?

**leave a comment**for the author, please follow the link and comment on their blog:

**R-exercises**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...