(This article was first published on

**NIR-Quimiometría**, and kindly contributed to R-bloggers)It is clear that MSC does not remove the entire scatter in the raw spectra, so some of the information is hidden by the scatter. Improvement of the sample presentation will help to remove the scatter.

We know that the first loading is much related to the main source of variance (in this case the scatter). In the next figure, I overplot the standard deviation spectrum (multiplied by 10, in order to compare them easily) with the first loading.

**> l1sd10<-cbind(loading1,sdfattyac_msc_c)**

**> l1sd10<-cbind(loading1,sd10)**

**> plot(t(l1sd10))**

**> matplot(wavelengths,l1sd10,lty=1,pch=21)**

The second loading will give us more details about the bands positions.

I´m going to use the function “Find Peaks”, from the package “quantmode”.

**> findPeaks(loading2)**

**X878 X932 X972**

**15 42 62**

The band at 932 nm (data point 42) is probably due to a C-H third overtone vibration of fat. The band at 972nm has some relation with the C-H2 vibration and water. The band at 878 seems to be also related with fat.

We can also interpret if possible the other loadings.

We saw how one of the samples (66) has a MD of 11.6. Let´s see the values for the six constituents for this sample:

**> fattyac_msc[66,1:6]**

**C16_0 C16_1 C18_0 C18_1 C18_2 C18_3**

**66 15.8 2 6 62.3 10.2 0.6**

Let´s compare with the summary

**> summary(fattyac_msc)**

**C16_0 C16_1**

**Min. : 0.00 Min. :1.500**

**1st Qu.:20.10 1st Qu.:2.000**

**Median :21.00 Median :2.200**

**Mean :21.34 Mean :2.267**

**3rd Qu.:22.90 3rd Qu.:2.500**

**Max. :26.00 Max. :3.500**

**C18_0 C18_1**

**Min. : 5.800 Min. :43.80**

**1st Qu.: 8.600 1st Qu.:51.95**

**Median : 9.400 Median :54.50**

**Mean : 9.711 Mean :53.93**

**3rd Qu.:10.500 3rd Qu.:56.15**

**Max. :14.000 Max. :62.30**

**C18_2 C18_3**

**Min. : 5.500 Min. :0.3000**

**1st Qu.: 7.600 1st Qu.:0.5000**

**Median : 8.500 Median :0.6000**

**Mean : 8.503 Mean :0.6032**

**3rd Qu.: 9.100 3rd Qu.:0.7000**

**Max. :14.700 Max. :1.3000**

Sample 66 has the higher value for C18:1 (oleic acid), but it is not isolated in the histogram. For some reasons this sample differs from the others especially from 100 to 1050 nm. We will wait forward to take a decision about this sample.

Until now we have been managing with the X matrix.

Now we start to study the Y matrix. First thing to do is to have a look to the summary, and of course to the histograms.

**If you want to follow this tutorial, please send me an e_mail. I´ll send you the “txt” file attached.**

**
**

**> hist(C16_0,col=”red”)**

**> hist(C16_1,col=”blue”)**

**> hist(C18_0,col=”green”)**

**> hist(C18_1,col=”brown”)**

**> hist(C18_2,col=”violet”)**

**> hist(C18_3,col=”orange”)**

To

**leave a comment**for the author, please follow the link and comment on their blog:**NIR-Quimiometría**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...