# PLS2 with "R"

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**NIR-Quimiometría**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I´ve been working these days with PLS2 calibrations with a chemometric software called “Unscrambler” with a data set called “jam”. I said “can I develop PLS2 models with R?”.I look in the book “Introduction to Multivariate Statistical Analysis in Chemometrics”, and I got the response “Yes, we can”.

I have other posts for PLS regressions, but it is PLS1, where we have an X matrix (spectra) and we make a regression for one constituent of the Y matrix at a time. What about to make the regression for all the constituents at the same time using the whole Y matrix?. That is the purpose of PLS2.

PLS2 is recommended when there is a high correlation between the constituents.

**library(chemometrics)**

**data(cereal)**

This data is part of a set used by Varmuza et al. 2008, for other papers.

You can get a description for this data in the R help page:

__Description__*For 15 cereals an X and Y data set, measured on the same objects, is available. The X data are 145 infrared spectra, and the Y data are 6 chemical/technical properties (Heating value, C, H, N, Starch, Ash). Also the scaled Y data are included (mean 0, variance 1 for each column). The cereals come from 5 groups B=Barley, M=Maize, R=Rye, T=Triticale, W=Wheat.*

Once loaded, take a look to the data

**dim(cereal$X)**

**dim(cereal$Ysc)**

We can have a look to the spectra, (it is already treated with SG and first derivative).

**wavelengths<-seq(1126,2278,by=8)**

**matplot(wavelengths,t(cereal$X),lty=1,xlab=”wavelengths(nm)”,ylab=”log(1/R)”)**

Now let´s run PLS2, using “mvsr”, with “LOO”

*(leave one out)*cross validation.**cerpls2<-mvr(Ysc~X,data=cereal,method="simpls",validation="LOO")**

We can see a summary of the results:

**summary(cerpls2)**

Now we have to take an important decision,

__“How many terms to choose?”.__Plots can help us with it:

**plot(RMSEP(cerpls2), legendpos = “topright”)**

We have to select an average, and looking to the plots we can say that 7 is fine, anyway for “starch” less terms would be fine, but for the rest 6 or 7 is the correct number.

To

**leave a comment**for the author, please follow the link and comment on their blog:**NIR-Quimiometría**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.