Model Evaluation Exercises 1

December 2, 2016
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

model-evaluationWe are committed to bringing you 100% authentic exercise sets. We even try to include as different datasets as possible to give you an understanding of different problems. No more classifying Titanic dataset. R has tons of datasets in its library. This is to encourage you to try as many datasets as possible. We will grasp the basics of gauging accuracy of the model.

It will be helpful to go over Tom Fawcett’s research paper on ‘An introduction to ROC analysis’

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1

In pattern recognition and classification, we use precision(fraction of retrieved instances that are relevant) and recall(fraction of relevant instances that are retrieved). In other words precision helps us understand whether there is overlap in the model and whether the model can distinguish between the items. Recall,also called sensitivity, helps us gauge how many positives we can identify out of all the positives.

Suppose we design a model to identify iphones from a video that also contain android phones. If the program identifies 5 iphones in a scene containing 7 iphones and some android phones. if 3 of the identification are correct but 2 are actually android phones.
a. What is the precision of the model?
b. What is the recall of the model?

Exercise 2

Suppose we created the model and want to assess how well it predicted. We will compare the actual value from the Test set to the predicted value that is derived from the model.

FALSE TRUE
0 94 23
1 24 100

Using the confusion matrix above, answer the following questions
a. Number of true positives?
b. Number of false negatives?
c. Number of true negatives?
d. Number of false positives?

Exercise 3
Quick way to gauge the accuracy of the model, assuming that the model target class is balanced, is to use the formula (TN+TP)/N. What is the accuracy of the model?

Exercise 4
Quick way to gauge the error rate is to use the formula (FP/FN)/N. What is the overall error rate?

Exercise 5
Sensitivity is defined as TP/(TP+FN). Specificity is defined as TN/(TN+FP). Using that information, answer the following questions
a.What is the sensitivity of the model?
b.What is the specificity of the model?

Exercise 6

There are usually two type of errors, false positive and false negative. False negative error rate is defined as FN/(TP+FN). False positive error rate is defined as FP/(TN+FP). Now answer the following questions:
a. What is the False negative error?
b. What is the False positive error?

Exercise 7

Now suppose this is a model like
that was used to identify iphones from androids in a video(Q1).
a. What is the precision?
b. What is the recall? Hint: Recall is the same as sensitivity.

Exercise 8
Go ahead and run the code below. You may run line by line to see what is happening. In brief, we are loading the housing data, changing the Cont variable to binary class, splitting the data into Train and Test set, Building a model, predicting and comparing actual Cont value in the Test set with predicted value in pred.

attach(housing)
housing$Cont=ifelse(housing$Cont==”High”,1,0)
spl=sample.split(housing$Cont,SplitRatio = 0.7)
Train=housing[spl==TRUE,]
Test=housing[spl==FALSE,]
model 0.5)

Exercise 9
Using the confusion matrix output from your last code, answer the following questions.
a) What is the Precision?
b) What is the recall?

Exercise 10
Using the confusion matrix, answer the follwing questions
c) What is the accuracy of the model?
d) What is the overall error rate?

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)