PLS2 with "R"

September 22, 2012
By

(This article was first published on NIR-Quimiometría, and kindly contributed to R-bloggers)

I´ve been working these days with PLS2 calibrations with a chemometric software called “Unscrambler” with a data set called “jam”. I said “can I develop PLS2 models with R?”.I look in the book “Introduction to Multivariate Statistical Analysis in Chemometrics”, and I got the response “Yes, we can”.
I have other posts for PLS regressions, but it is PLS1, where we have an X matrix (spectra) and we make a regression for one constituent of the Y matrix at a time. What about to make the regression for all the constituents at the same time using the whole Y matrix?. That is the purpose of PLS2.
PLS2 is recommended when there is a high correlation between the constituents.
library(chemometrics)
data(cereal)
This data is part of a set used by Varmuza et al. 2008, for other papers.
You can get a description for this data in the R help page:
Description
For 15 cereals an X and Y data set, measured on the same objects, is available. The X data are 145 infrared spectra, and the Y data are 6 chemical/technical properties (Heating value, C, H, N, Starch, Ash). Also the scaled Y data are included (mean 0, variance 1 for each column). The cereals come from 5 groups B=Barley, M=Maize, R=Rye, T=Triticale, W=Wheat.

Once loaded, take a look to the data
dim(cereal$X)
dim(cereal$Ysc)
We can have a look to the spectra, (it is already treated with SG and first derivative).
wavelengths<-seq(1126,2278,by=8)
matplot(wavelengths,t(cereal$X),lty=1,xlab="wavelengths(nm)",ylab="log(1/R)")
Now let´s run PLS2, using “mvsr”, with “LOO” (leave one out) cross validation.
cerpls2<-mvr(Ysc~X,data=cereal,method="simpls",validation="LOO")
We can see a summary of the results:
summary(cerpls2)
Now we have to take an important decision, “How many terms to choose?”.
Plots can help us with it:
plot(RMSEP(cerpls2), legendpos = "topright")

We have to select an average, and looking to the plots we can say that 7 is fine, anyway for "starch" less terms would be fine, but for the rest 6 or 7 is the correct number.

To leave a comment for the author, please follow the link and comment on his blog: NIR-Quimiometría.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags:

Comments are closed.