# Shootout 2012 : first PLS regressions

November 23, 2012
By

(This article was first published on NIR-Quimiometria, and kindly contributed to R-bloggers)

It´s time to start developing some regressions in order to find the best math treatment, the best number of terms, the best spectral regions, the best regression method,….
This time I´m working with the PLS  package in R, and just to make more familiarity with it, I us the pls regression, with the full range, and with two math treatments.: MSC and SG Filters (with first and second derivatives). I will try in other post to select spectral regions, or even other regression methods.
Indeed to look to the Cross Validation statistics I will look to the prediction statistics for the test set. We have seen that the samples in this set are not fully represented by the training set, and if we predict them fine is a symptom that the equation is robust. Don´t forget that the idea is to predict as better as possible a validation set, which in theory we don´t know the values. (we already know them and I will compare my results in the future with the winner, and other participants).

I develop a regression (1) with MSC, and I look to the prediction statistics for the test set:
>Active_reg1<- pls(Active~NIT.msc,ncomp=5,data=shootcalmsc.2012 , validation = “LOO”)
>RMSEP(Active_reg1,newdata=shoottestmsc.2012)

(Intercept)      1 comps      2 comps      3 comps      4 comps      5 comps
1.1637       0.6944       0.5028       0.4586       0.4913       0.5355

Now the regression (2) with a SG filter (first derivative)
>Active_reg2<- plsr(Active~NITsg, ncomp =5,data=shootcalsg.2012 , validation = “LOO”)
>RMSEP(Active_reg2,newdata=shoottestmsc.2012)
(Intercept)      1 comps      2 comps      3 comps      4 comps      5 comps
1.1637       1.0414       0.4172       0.4313       0.4531       0.4556

In case that the SG filter has the second derivative, the RMSEP statistics are:
(Intercept)      1 comps      2 comps      3 comps      4 comps      5 comps
1.1637       0.5506       0.4269       0.4227       0.4134       0.4009

We can have a look to the Predicted vs. Lab plots:
>predplot(Active_reg1,ncomp=3,newdata=shoottestmsc.2012,asp=1,line=TRUE,main=”MSC math-treatment”)>predplot(Active_reg2,ncomp=2,newdata=shoottestsg.2012,asp=1,line=TRUE,main=”SG second der”)

Well, The plots are not really nice, It is clear that we can separate the two groups, but the results are not very accurate. I have to continue working on it in order to see if I improve this plot, looking to the RMSEP.
We can play with the parameters of the SG filter and try, but I think is better to select spectral regions. I will let you know in other post.
If you are interested in this post, there are some previous ones you can find also interesting:
“Sample Sets” plots (Shootout-2012)
Shootout 2012: Test & Val Sets proyections
Working with Shootout – 2012 in R (001)
Shootout 2012 files

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...