NIT: Fatty acids study in R – Part 006

[This article was first published on NIR-Quimiometría, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In one of the columns, for constituent C16_0, one sample (57) has a value of “zero” (we could see this in the histogram).The reason for that is that the laboratory did not supply this value. The PLS regression will consider the lab value as cero, so we will get a plot like this:
I observed also that the sample 219 has a high residual for the regressions of all the constituents, so I decided to remove these two samples from the sample set in order to continue, and to develop the models.
I am starting with R, so I will appreciate if you add comments in order to do this task in a simpler way.
I create two sample sets, in order to remove these two samples (219 and 57):
> fattyac1<-fattyac_msc[1:56,]
> fattyac2<-fattyac_msc[58:218,]
and I combined this three sets again:
> fattyac_msc1<-rbind(fattyac1,fattyac2)
Well, I can develop my regression now:

Now we have to take the decision of how many terms to choose. Let´s see the validation plot with 7 and 12 components (terms).

It is clear that the decision to choose one model or the other will have a great influence in the predictions. We need a validation set to make a better decision. But I think that it will work better with 12 terms.
It will be important, if possible to find samples with C16:0 values bellow 18 to add to our database in order to develop a better model.
Another decision could be to keep out this extreme sample until we find more, but we can decide to leave it, in order to extrapolate better in this zone.
It is important not to have unique samples in the model. In this case we have one. We have to consider this.
If you want to follow this tutorial, please send me an e_mail. I´ll send you the “txt” file attached.

To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometría. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)