# Shootout 2012: Test & Val Sets proyections

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**NIR-Quimiometria**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It is obvious (after seeing the spectra of the calibration set), that we have at least three clusters, and that this can be related with the concentration of the active ingredient in the tablets. If we see the scores in the PC1-PC2 score map we will see the three clusters.

I have imported the test set into R, and I did project the test set into the PC1-PC2 score map (developed with the calibration samples), and I found another cluster.

If we read the Chemometrics Shootout rules, we see:

*“This year’s challenge will consist in developing the best model for the active*

*ingredient using the calibration data. However, the most important task will be to build a*

*model that will be robust to production scale differences. In addition, the quality of the*

*presentation and the reasoning behind the approach taken will be used to determine the*

*winner”.*

So to predict as accurate as possible this test set is important to approach the challenge.

And what about the Validation Set.We don´t know the reference values, but we can project the samples again into the PC1-PC2 score map (developed with the calibration samples) in order to see more clusters, or if the samples are represented in the Training Set.

As we can see some test and validation samples do not overlap with any samples of the calibration set, so we have to consider this when developing the model.

R is really wonderful making these plots:

**Black circles: Calibration Samples**

**Red triangles: Test Samples**

**green crosses: Validation samples**To

**leave a comment**for the author, please follow the link and comment on their blog:**NIR-Quimiometria**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.