Creating Sample Datasets – Exercises

October 7, 2016
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

ancient-coins
Creating sample data is a common task performed in many different scenarios.

R has several base functions that make the sampling process quite easy and fast.

Below is an explanation of the main functions used in the current set of exercices:

1. set.seed() – Although R executes a random mechanism of sample creation, set.seed() function allows us to reproduce the exact sample each time we execute a random-related function.

2. sample() – Sampling function. The arguments of the function are:
x – a vector of values,
size – sample size
replace – Either use a chosen value more than once or not
prob – the probabilities of each value in the input vector.

3. seq()/seq.Date() – Create a sequence of values/dates, ranging from a ‘start’ to an ‘end’ value.

4. rep() – Repeat a value/vector n times.

5. rev() – Revert the values within a vector.

You can get additional explanations for those functions by adding a ‘?’ prior to each function’s name.

Answers to the exercises are available here.
If you have different solutions, feel free to post them.

Exercise 1
1. Set seed with value 1235
2. Create a Bernoulli sample of 100 ‘fair coin’ flippings.
Populate a variable called fair_coin with the sample results.

Exercise 2
1. Set seed with value 2312
2. Create a sample of 10 integers, based on a vector ranging from 8 thru 19.
Allow the sample to have repeated values.
Populate a variable called hourselect1 with the sample results

Exercise 3
1. Create a vector variable called probs with the following probabilities:
‘0.05,0.08,0.16,0.17,0.18,0.14,0.08,0.06,0.03,0.03,0.01,0.01’
2. Make sure the sum of the vector equals 1.

Exercise 4
1. Set seed with value 1976
2. Create a sample of 10 integers, based on a vector ranging from 8 thru 19.
Allow the sample to have repeated values and use the probabilities defined in the previous question.
Populate a variable called hourselect2 with the sample results

Exercise 5
Let’s prepare the variables for a biased coin:
1. Populate a variable called coin with 5 zeros in a row and 5 ones in a row
2. Populate a variable called probs having 5 times value ‘0.08’ in a row and 5 times value ‘0.12’ in a row.
3. Make sure the sum of probabilities on probs variable equals 1.

Exercise 6
1. Set seed with value 345124
2. Create a biased sample of length 100, having as input the coin vector, and as probabilities probs vector of probabilities.
Populate a variable called biased_coin with the sample results.

Exercise 7
Compare the sum of values in fair_coin and biased_coin

Exercise 8
1. Create a ‘Date’ variable called startDate with value 9th of February 2010 and a second ‘Date’ variable called endDate with value 9th of February 2005
2. Create a descending sequence of dates having all 9th’s of the month between those two dates. Populate a variable called seqDates with the sequence of dates.

Exercise 9
Revert the sequence of dates created in the previous question, so they are in ascending order and place them in a variable called RevSeqDates

Exercise 10
1. Set seed with value 10
2. Create a sample of 20 unique values from the RevSeqDates vector.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)