Sampling Exercise Part 1

November 13, 2016
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

samplingIn this Exercise, we will dive quickly through some basic sampling methods. Follow along this series to use these methods later for our decision trees modelling exercise. We will sample using the package caTools and caret. This is a beginner level exercise. Please refer to the help section for set.seed(), sample.split(),createDataPartition(), and createFolds() functions. You may also find it helpful to go over subset() function.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
Load the iris data and also load the package “caTools”. If the package is not installed, then use install.packages command to install it.

Exercise 2
Set the seed to 100

Exercise 3
use the function sample.split with a SplitRatio=0.7 to split the dataset into two folds using the species class. store the results in the variable split

Exercise 4
use subset function to subset the dataframe where the split is True. Store this result in the variable called Train

Exercise 5
Store the other 30 percent of the sample in the variable Test. Use the same subset method.

Exercise 6
Print out the number of rows in the Test and Train variables. You should see 70 percent of data in the Train and 30 percent in the Test.

Exercise 7
Install and load the library “caret”

Exercise 8
Set the seed to 500 and use the createDataPartition to do the same 2 fold split as Q3 but with a 80:20 ratio with List=FALSE

Exercise 9
Use the createDataPartition function to create 5 different samples of the training data.

Exercise 10
We know how to make 2 splits now and make 5 different samples. But what about 5 equal splits? Use the createFolds() command to make 5 equal partitions of iris data-set. Make sure that each partitiion has an equal representation of the species class as much as possible.

Please help us to improve R-exercises:

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)