It´s time to start developing some regressions in order to find the best math treatment, the best number of terms, the best spectral regions, the best regression method,….

This time I´m working with the PLS package in R, and just to make more familiarity with it, I us the pls regression, with the full range, and with two math treatments.: MSC and SG Filters (with first and second derivatives). I will try in other post to select spectral regions, or even other regression methods.

Indeed to look to the Cross Validation statistics I will look to the prediction statistics for the test set. We have seen that the samples in this set are not fully represented by the training set, and if we predict them fine is a symptom that the equation is robust. Don´t forget that the idea is to predict as better as possible a validation set, which in theory we don´t know the values. (we already know them and I will compare my results in the future with the winner, and other participants).

I develop a regression (1) with MSC, and I look to the prediction statistics for the test set:

**>**Active_reg1<- pls(Active~NIT.msc,ncomp=5,data=shootcalmsc.2012 , validation = “LOO”)

**>**RMSEP(Active_reg1,newdata=shoottestmsc.2012)

(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps

1.1637 0.6944 0.5028 0.4586 0.4913 0.5355

Now the regression (2) with a SG filter (first derivative)

**>**Active_reg2<- plsr(Active~NITsg, ncomp =5,data=shootcalsg.2012 , validation = “LOO”)

**>**RMSEP(Active_reg2,newdata=shoottestmsc.2012)

(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps

1.1637 1.0414 0.4172 0.4313 0.4531 0.4556

In case that the SG filter has the second derivative, the RMSEP statistics are:

(Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps

1.1637 0.5506 0.4269 0.4227 0.4134 0.4009

We can have a look to the Predicted vs. Lab plots:

**>**predplot(Active_reg1,ncomp=3,newdata=shoottestmsc.2012,asp=1,line=TRUE,main=”MSC math-treatment”)**>**predplot(Active_reg2,ncomp=2,newdata=shoottestsg.2012,asp=1,line=TRUE,main=”SG second der”)

Well, The plots are not really nice, It is clear that we can separate the two groups, but the results are not very accurate. I have to continue working on it in order to see if I improve this plot, looking to the RMSEP.

We can play with the parameters of the SG filter and try, but I think is better to select spectral regions. I will let you know in other post.

*Related*

To

**leave a comment** for the author, please follow the link and comment on their blog:

** NIR-Quimiometria**.

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...