Shootout 2012: Test & Val Sets proyections
[This article was first published on NIR-Quimiometria, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It is obvious (after seeing the spectra of the calibration set), that we have at least three clusters, and that this can be related with the concentration of the active ingredient in the tablets. If we see the scores in the PC1-PC2 score map we will see the three clusters.
I have imported the test set into R, and I did project the test set into the PC1-PC2 score map (developed with the calibration samples), and I found another cluster.
If we read the Chemometrics Shootout rules, we see:
“This year’s challenge will consist in developing the best model for the active
ingredient using the calibration data. However, the most important task will be to build a
model that will be robust to production scale differences. In addition, the quality of the
presentation and the reasoning behind the approach taken will be used to determine the
winner”.
So to predict as accurate as possible this test set is important to approach the challenge.
And what about the Validation Set.We don´t know the reference values, but we can project the samples again into the PC1-PC2 score map (developed with the calibration samples) in order to see more clusters, or if the samples are represented in the Training Set.
As we can see some test and validation samples do not overlap with any samples of the calibration set, so we have to consider this when developing the model.
R is really wonderful making these plots:
Black circles: Calibration Samples
Red triangles: Test Samples
green crosses: Validation samples
To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometria.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.