# "R": Predicting a Test Set (Gasoline)

February 9, 2012
By

(This article was first published on NIR-Quimiometría, and kindly contributed to R-bloggers)

> data(gasoline)
> #60 spectra of gasoline (octane is the constituent)
> #We divide the whole Set into a Train Set and a Test Set.

> gasTrain<-gasoline[1:50,]
> gasTest<-gasoline[51:60,]

> #Let´s develop the PLSR with the Tain Set and LOO CV
> gas1<-plsr(octane~NIR,ncomp=10,data=gasTrain,validation=”LOO”)
> summary(gas1)
Data:   X dimension: 50 401
Y dimension: 50 1
Fit method: kernelpls
Number of components considered: 10

VALIDATION: RMSEP
Cross-validated using 50 leave-one-out segments.
(Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
CV           1.545    1.357   0.2966   0.2524   0.2476   0.2398   0.2319
adjCV        1.545    1.356   0.2947   0.2521   0.2478   0.2388   0.2313
7 comps  8 comps  9 comps  10 comps
CV      0.2386   0.2316   0.2449    0.2673
adjCV   0.2377   0.2308   0.2438    0.2657

TRAINING: % variance explained
1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
X         78.17    85.58    93.41    96.06    96.94    97.89    98.38    98.85
octane    29.39    96.85    97.89    98.26    98.86    98.96    99.09    99.16
9 comps  10 comps
X         99.02     99.19
octane    99.28     99.39

> #For this exercice we decide 3 components
> #Let´s predict our Test Set with this 3 components Model.

> predict(gas1,ncomp=3,newdata=gasTest)
, , 3 comps     octane
51 87.94907
52 87.30484
53 88.21420
54 84.86945
55 85.24244
56 84.57502
57 87.37650
58 86.78971
59 89.10282
60 86.97223

> #To Plot these data:
>predplot(gas1,ncomp=3,newdata=gasTest,asp=1,line=TRUE)

> #Let´s look to the RMSEP Statistic.This is very nice tool to decide if 3 components is fine or we can choose more or less components.
> RMSEP(gas1,newdata=gasTest)
(Intercept)      1 comps      2 comps      3 comps      4 comps      5 comps
1.5369       1.1696       0.2445       0.2341       0.3287       0.2780
6 comps      7 comps      8 comps      9 comps     10 comps
0.2703       0.3301       0.3571       0.4090       0.6116

> #It´s fine, we can also consider to choose only two.The RMSEP is 0,234.
> #The CV for the Model with 3 components was: 0,252.
> #Really R is a wonderful tool to develop regressions, and to    understand better all what is behind the algorithms.
> #We can get a lot of literature on internet to start working with R.
> #Thanks to Bjorn-Helge Mevik & Ron Wehres for their good   tutorials about the PLS Package, they help me to understand better this program and to continue learning,(I have ordered some books).

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: