I´ve been working these days with PLS2 calibrations with a chemometric software called “Unscrambler” with a data set called “jam”. I said “can I develop PLS2 models with R?”.I look in the book “Introduction to Multivariate Statistical Analysis in Chemometrics”, and I got the response “Yes, we can”.
I have other posts for PLS regressions, but it is PLS1, where we have an X matrix (spectra) and we make a regression for one constituent of the Y matrix at a time. What about to make the regression for all the constituents at the same time using the whole Y matrix?. That is the purpose of PLS2.
PLS2 is recommended when there is a high correlation between the constituents.
This data is part of a set used by Varmuza et al. 2008, for other papers.
You can get a description for this data in the R help page:
For 15 cereals an X and Y data set, measured on the same objects, is available. The X data are 145 infrared spectra, and the Y data are 6 chemical/technical properties (Heating value, C, H, N, Starch, Ash). Also the scaled Y data are included (mean 0, variance 1 for each column). The cereals come from 5 groups B=Barley, M=Maize, R=Rye, T=Triticale, W=Wheat.
Once loaded, take a look to the data
We can have a look to the spectra, (it is already treated with SG and first derivative).
Now let´s run PLS2, using “mvsr”, with “LOO” (leave one out) cross validation.
We can see a summary of the results:
Now we have to take an important decision, “How many terms to choose?”.
Plots can help us with it:
plot(RMSEP(cerpls2), legendpos = "topright")
We have to select an average, and looking to the plots we can say that 7 is fine, anyway for "starch" less terms would be fine, but for the rest 6 or 7 is the correct number.