I have fixed the link to the video “Removing Y outliers from the validation set” and it´s time to see what could be the next step to the function. As we know the RMSEP is the sum of the explained (BIAS) and unexplained error (SEP).
We get also the SEP, so we know the unexplained error, and we can compare bouth in order to see if the Bias is significant, so the adjustment of the Bias is necessary but only temporally until we find the source (if possible) of this explained error. If it is a new source of variance which should be included in the calibration we will develop the model again.
Now I´m working in another function which tell us if the unexplained error (SEP) is out of limits, or what is called the unexplained error confidence limits (UECLs).
This is a F-test (ratio of two variances): Validation Set and Calibration Set.
In the Calibration Set we have to decide if to choose the SEC (too optimistic) or the SECV (CV error) which is more realistic.
Of course degrees of freedom will be different for bough:
Nv-1 for the validation set
Nc-P-1 for the calibration set
(N =number of samples, P=number of terms).
I will come soon with this, finding if the function is already developed in this case in R (please add your comments), or I have to develop it.
The function will say if a certain SEP value, with a certain number of samples can be accepted if the value exceeds a certain percentage of the SEC or SECV.