Dividing the Sample Set in two (Validation & Training)

[This article was first published on NIR-Quimiometría, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We have in the Demo sample set “66” samples.  In this post we´ll see one way to divide the set in two parts: one for “Validation” and another for Training or Calibration.
The selection will be random. And we are going to use the command: “sample”. I decided to select 10 samples for validation, and the rest for training.
If you repeat this sentence several times, you will get different sets every time.
In my case the samples selected are:
Samples: 25,50,8,49,39,12,16,63,35 y 41
These samples are in rows, and we have to create a training set removing them:
We will create the same sample sets for the other data frame with math treatments:
It is important to look to the summary of the sample sets to check and compare the statistics for the different constituents.
Or to look to the distribution plots, like in this case for moisture:

To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometría.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)