Creating Sample Datasets – Exercises

[This article was first published on R-exercises, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

ancient-coins
Creating sample data is a common task performed in many different scenarios.

R has several base functions that make the sampling process quite easy and fast.

Below is an explanation of the main functions used in the current set of exercices:

1. set.seed() – Although R executes a random mechanism of sample creation, set.seed() function allows us to reproduce the exact sample each time we execute a random-related function.

2. sample() – Sampling function. The arguments of the function are:
x – a vector of values,
size – sample size
replace – Either use a chosen value more than once or not
prob – the probabilities of each value in the input vector.

3. seq()/seq.Date() – Create a sequence of values/dates, ranging from a ‘start’ to an ‘end’ value.

4. rep() – Repeat a value/vector n times.

5. rev() – Revert the values within a vector.

You can get additional explanations for those functions by adding a ‘?’ prior to each function’s name.

Answers to the exercises are available here.
If you have different solutions, feel free to post them.

Exercise 1
1. Set seed with value 1235
2. Create a Bernoulli sample of 100 ‘fair coin’ flippings.
Populate a variable called fair_coin with the sample results.

Exercise 2
1. Set seed with value 2312
2. Create a sample of 10 integers, based on a vector ranging from 8 thru 19.
Allow the sample to have repeated values.
Populate a variable called hourselect1 with the sample results

Exercise 3
1. Create a vector variable called probs with the following probabilities:
‘0.05,0.08,0.16,0.17,0.18,0.14,0.08,0.06,0.03,0.03,0.01,0.01’
2. Make sure the sum of the vector equals 1.

Exercise 4
1. Set seed with value 1976
2. Create a sample of 10 integers, based on a vector ranging from 8 thru 19.
Allow the sample to have repeated values and use the probabilities defined in the previous question.
Populate a variable called hourselect2 with the sample results

Exercise 5
Let’s prepare the variables for a biased coin:
1. Populate a variable called coin with 5 zeros in a row and 5 ones in a row
2. Populate a variable called probs having 5 times value ‘0.08’ in a row and 5 times value ‘0.12’ in a row.
3. Make sure the sum of probabilities on probs variable equals 1.

Exercise 6
1. Set seed with value 345124
2. Create a biased sample of length 100, having as input the coin vector, and as probabilities probs vector of probabilities.
Populate a variable called biased_coin with the sample results.

Exercise 7
Compare the sum of values in fair_coin and biased_coin

Exercise 8
1. Create a ‘Date’ variable called startDate with value 9th of February 2010 and a second ‘Date’ variable called endDate with value 9th of February 2005
2. Create a descending sequence of dates having all 9th’s of the month between those two dates. Populate a variable called seqDates with the sequence of dates.

Exercise 9
Revert the sequence of dates created in the previous question, so they are in ascending order and place them in a variable called RevSeqDates

Exercise 10
1. Set seed with value 10
2. Create a sample of 20 unique values from the RevSeqDates vector.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)