Simulation in R For AP Statistics

November 28, 2016
By

(This article was first published on R – Saturn Science, and kindly contributed to R-bloggers)

Make a simulation and function using R.

Used With Permission
My Class at Work With R

I continue to teach some R programming in my AP Stats class because this is an essential skill for them to have.

We are now in chapter five of our textbook, Probability and Simulation. I had my students review the parking problem on page 290. The problem asked to make a simulation of picking two students from a total of 95 in the high school and finding the probability that these two students are from the same class of only 28 students.

The textbook solves the simulation problem by using the random number table in the back of the book. By reading across, you read two digits at a time and check to see if the two numbers are less than or equal to 28.

To use the random digits chart, you need to number the students from 01 to 95. Since there are 28 students in the class that won the lottery, we want to simulate the probability of picking two students from this class of 28 students from the total population of 95.

For this lesson, I wanted my class to explore the concept of simulation more in depth and I thought using R would be better at teaching this lesson than using our TI Nspire calculators. The book’s example gives the simulated probability at about 10 percent.

To begin, I demonstrated to them the sample function in class on the Smartboard with RStudio. I said, “Lets take a sample 100 times and see what it looks like.” I demonstrated and displayed the basic outline of the sample function as follows:

sample (from this number : to this number, how many, replace = T or F)

sample(1:95,2, replace = FALSE)

## [1] 56 45

Here is the result of just using sample. If both numbers are less than 29, it means both students were picked from that one class. But what if you wanted to repeat this sampling technique? You would have to keep running this line of code, and is seems that there should be a better way. The better way is to use the replicate function.

How to use Replicate

Class Lesson and Rmd Code on Smartboard
Class Lesson and Rmd Code on Smartboard

Here we will use replicate to take a sample 100 times and check to see how many times we get a pair of numbers from 1 to 28.

To use replicate, we specify two arguments: n tells R how many times we want to repeat something and expr is the R command we want to repeat.

For example:

replicate(n=100, expr = sample(1:95,2, replace = FALSE)) # take a sample 100 times.

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,]   13   20   23   69   16   13   20   42   44    93    75    68    55
## [2,]   73   67   67   46   62   17   92   46   73    65     1    93    61
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## [1,]    88    28    62    49    39    46    36     6    46    76    67
## [2,]    70    68    69    74    74    89    55    65    60    26    94
##      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35]
## [1,]    49    55    14    77    23    24    92    60    83    80    42
## [2,]    34    60    88    93    61     1    28    41    85    63    44
##      [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46]
## [1,]    30    37    47    61    61    89    80    87    28    12     3
## [2,]    38    19    58    74     1    66     2    67    22    73     1
##      [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57]
## [1,]    42    79    64     3     5    37    85    42    63    37     1
## [2,]    47     2    60    14     8    19    14    28    74     6    65
##      [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66] [,67] [,68]
## [1,]    32    78    95    30    69    69    56    43    77     7    35
## [2,]     4    14    88    84    46    95    21    44    30    73     7
##      [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77] [,78] [,79]
## [1,]    42     8    30    37    55     3    75    33    16    88    86
## [2,]    58    53    10    22    69    45    22    24    22    55    54
##      [,80] [,81] [,82] [,83] [,84] [,85] [,86] [,87] [,88] [,89] [,90]
## [1,]    83    70    58    59    87    75    52    51    22    38    91
## [2,]    44    14    62    11    64    30    29    91    25     6    51
##      [,91] [,92] [,93] [,94] [,95] [,96] [,97] [,98] [,99] [,100]
## [1,]    61    13     2     6    86    11    28     6     6      5
## [2,]    70    94    75    34    63    64    54    45    88     56

We can also shorten this a bit like this to get the same results.

replicate(100, sample(1:95,2, replace = FALSE))

Writing a function:

Another way to do this simulation is to write a short function and call that function in replicate. Here is a simple function that will take a sample of 2 from 95 and run it a specified number of times.

Use a function and replicate

function-in-rmd-file
R Markdown makes it easier to publish

Here I created a function called park where it samples without replacement 2 numbers from a list of 95. Then we call the function park in replicate.

The word function must be followed by parentheses. It tells R that what comes next is a function. The curly braces, {}, are the beginning and ending of your function. Everything between them is part of your function.

The return() statement is the ending of your function. What you put between the parentheses is returned from inside the function to your work space. Here I use the assignment operator to put the function in an object called park.

park <- function(){
  win <- sample(1:95,2,replace = FALSE)
  return(win)
}

replicate(n=100, park())

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,]    5   14   34   85   28   36   94   46   45    47    93    46    26
## [2,]   47   50   38   62   34   42    9   93   85    89    54    28    70
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## [1,]    67    52    56    44    94    43    86    69    26    34    81
## [2,]    73     7    57    58     8     8    20    15     3    92    13
##      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35]
## [1,]    38    43    59    19    71    81    67    22    45    27    70
## [2,]    72    73    65    59    32    34    93    42    17     3    89
##      [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46]
## [1,]    70    21     2    18     8    89    89    12    15     7    18
## [2,]    59     7    34    45    74    18    91    83    67     2    77
##      [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57]
## [1,]    69    20    69    18    85    92    83    44    13    74    19
## [2,]    35    43     9    68    70    88    95    14    52    21    46
##      [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66] [,67] [,68]
## [1,]    21     4    42    87    68    56    78     8    26    29    14
## [2,]    17    26    41    25    67     5    59     6    21    26    50
##      [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77] [,78] [,79]
## [1,]    92    88    48    59    10    21    10    67     5    25     9
## [2,]    69    30    81    22    72    95    40    73    95    34    16
##      [,80] [,81] [,82] [,83] [,84] [,85] [,86] [,87] [,88] [,89] [,90]
## [1,]    61    43    95    90    72    12    61    27    63    17    14
## [2,]    39    88    18    18    86    54    94    14    84    89    84
##      [,91] [,92] [,93] [,94] [,95] [,96] [,97] [,98] [,99] [,100]
## [1,]     3    94    46    13     9    12    88    43     5     78
## [2,]     7    73    52    20    61    28     5    34    65     21

Extension

One student asked if it were possible to make a for loop and I said yes. I also said that for loops run a little slow on large data sets and for our purpose, it might be better to just use replicate or make a function.

Their assignment is due in two days. They are to knit to Word and upload to our class Moodle site. They can also submit their Rmd file to Moodle.

For extra credit, I will have them try to figure out how to tally the percent chosen from the simulation. I suspect we will be using some type of loop or apply function.

We will learn in the next section the multiplication rule for probabilities. The probability for this problem is (28/95)(27/94) = 0.0849 or about 8.5%

To leave a comment for the author, please follow the link and comment on their blog: R – Saturn Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)