Shootout 2012: Test & Val Sets proyections

November 7, 2012

(This article was first published on NIR-Quimiometria, and kindly contributed to R-bloggers)

It is obvious (after seeing the spectra of the calibration set), that we have at least three clusters, and that this can be related with the concentration of the active ingredient in the tablets. If we see the scores in the PC1-PC2 score map we will see the three clusters.
I have imported the test set into R, and I did project the test set into the PC1-PC2 score map (developed with the calibration samples), and I found another cluster.
If we read the Chemometrics Shootout rules, we see:
“This year’s challenge will consist in developing the best model for the active
ingredient using the calibration data. However, the most important task will be to build a
model that will be robust to production scale differences. In addition, the quality of the
presentation and the reasoning behind the approach taken will be used to determine the
So to predict as accurate as possible this test set is important to approach the challenge.
And what about the Validation Set.We don´t know the reference values, but we can project the samples again into the PC1-PC2 score map (developed with the calibration samples) in order to see more clusters, or if the samples are represented in the Training Set.
As we can see some test and validation samples do not overlap with any samples of the calibration set, so we have to consider this when developing the model.
R is really wonderful making these plots:
Black circles: Calibration Samples
Red triangles: Test Samples
green crosses: Validation samples

To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometria. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...


Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)