# Dividing the Sample Set in two (Validation & Training)

March 29, 2012
By

(This article was first published on NIR-Quimiometría, and kindly contributed to R-bloggers)

We have in the Demo sample set “66” samples.  In this post we´ll see one way to divide the set in two parts: one for “Validation” and another for Training or Calibration.
The selection will be random. And we are going to use the command: “sample”. I decided to select 10 samples for validation, and the rest for training.
demo_raw_val<-demo_raw[sample(66,10),]
If you repeat this sentence several times, you will get different sets every time.
In my case the samples selected are:
Samples: 25,50,8,49,39,12,16,63,35 y 41
These samples are in rows, and we have to create a training set removing them:
demo_raw_train<-demo_raw[c(-25,-50,-8,-49,-39,-12,-16,-63,-35,-41),]
We will create the same sample sets for the other data frame with math treatments:
demo_msc_train<-demo_msc[c(-25,-50,-8,-49,-39,-12,-16,-63,-35,-41),]
demo_snv_train<-demo_snv[c(-25,-50,-8,-49,-39,-12,-16,-63,-35,-41),]
demo_msc_val<-demo_msc[c(25,50,8,49,39,12,16,63,35,41),]
demo_snv_val<-demo_msc[c(25,50,8,49,39,12,16,63,35,41),]
It is important to look to the summary of the sample sets to check and compare the statistics for the different constituents.
Or to look to the distribution plots, like in this case for moisture:

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: