I have downloaded (from the IDRC) the ASCI files of the Shootout 2012 (see: Shootout 2012 files), so I can work with the data to develop a model and predict a Validation Set.
For that task I have a “Calibration Set”, and a “Test Set”.
We can read details for this task in the IDRC web page: “instructions“.
Spectra is acquire in an FTIR instrument, and the space between wavelengths (X axis) is non linear, so I changed it by values 1.0, 2.0,…….,372.0.
I had to arrange the data to import it into R, and to organize the data frame in order to start with the observation of the spectra and the distribution.
As in other posts I am going to use “Chemometrics with R” package.
If we plot the calibration samples without any treatment we see like two sets of samples. This is an indication (as we work in transmittance) that probably there are differences in the pathlength:
Now we can apply the MSC (Multiple Scatter Correction) to reduce this physical proprieties and to enhance the chemical changes:
MSC here works really well and we can see that most of the variability is in the area from 200 to 240 aproximatelly.
Now we can see at less 3 clusters.
Let´s have a look now to the histogram:
We can start to get some conclusions to continue.