R: Interval Estimation of the Population Mean
[This article was first published on Analysis with Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Interval estimation of the population mean can be computed from functions of the following R packages:
Example 1. The 2012-2013 SASE scores of the 33 random students from College of Science and Mathematics (CSM) of MSU-IIT were recorded: 84, 93, 101, 86, 82, 86, 88, 94, 89, 94, 93, 83, 95, 86, 94, 87, 91, 96, 89, 79, 99, 98, 81, 80, 88, 100, 90, 100, 81, 98, 87, 95, and 94. The population of these scores are believe to be normally distributed with 6.8 standard deviation. Determine and interpret the 95% and 99% confidence interval of the population mean.
From the data, we obtain the following information: (i) the sample size is more than 30, and (ii) the population standard deviation is known. Therefore, the appropriate test is z-test. And the function to use is
Interpretation: The true mean of all SASE scores in the school year 2013-2014 from CSM is likely between 88.01327 and 92.65340 (95% CI). And the true mean of all SASE scores for the said college and school year is likely between 87.28425 and 93.38241 (99% CI).
Aside from the confidence interval, the function returns also the computed z-statistics with p-value, and as well as the point estimate of the mean. To get rid of this, one can add a suffix $conf.int to the function to extract the confidence interval only.
Example 2. The following data (341, 345, 338, 339, 340, 343, 341, 343, 341, 328, 343, 347, 337, 348, and 339) are random samples from normally distributed population. Compute and interpret the 90% confidence interval.
The appropriate test for this is t-test since the sample size is small, n < 30, and the population variance is unknown. And thus,
Interpretation: The true mean of the population of the given data above is likely between 285.5911 and 356.1423 (90% CI).
Often in textbooks, however, we are presented with summary statistics of the data like the next example below from Simplified Biostatistics by Abubakar S. Asaad.
Example 3. The biostatistician took a random sample of 49 patients from a list of all patients ever admitted to the hospital within a three-month period and the number of drugs prescribed per admission was determined for each. The average drug per case was found to be 7.5 with standard deviation of 2.5. Calculate and interpret the 95% confidence interval for true mean of all the patients ever admitted to the hospital.
In this example, no dataset is given, but we have the computed mean = 7.5 of this dataset, standard deviation = 2.5, and sample size = 49. Thus, to compute for the interval estimate of the population mean in R, we use the
Interpretation: The true mean of all the patients ever admitted to the hospital is likely between 6.800013 and 8.199987 (95% CI).
The
- stats – contains the
t.test
; - TeachingDemos – contains the
z.test
; and, - BSDA – contains the
zsum.test
andtsum.test
.
t.test
of the stats package is a student’s t test, and is use when raw dataset is given. The same case for z.test
, but this function is specifically for z-test of known population standard deviation. When dataset is not given and only the summary statistics (mean, and standard deviation) are presented, then the appropriate functions are zsum.test
or tsum.test
. Note that, t.test
and tsum.test
are functions of the same statistical test, and that of z.test
and zsum.test
. Consider the example below,Example 1. The 2012-2013 SASE scores of the 33 random students from College of Science and Mathematics (CSM) of MSU-IIT were recorded: 84, 93, 101, 86, 82, 86, 88, 94, 89, 94, 93, 83, 95, 86, 94, 87, 91, 96, 89, 79, 99, 98, 81, 80, 88, 100, 90, 100, 81, 98, 87, 95, and 94. The population of these scores are believe to be normally distributed with 6.8 standard deviation. Determine and interpret the 95% and 99% confidence interval of the population mean.
From the data, we obtain the following information: (i) the sample size is more than 30, and (ii) the population standard deviation is known. Therefore, the appropriate test is z-test. And the function to use is
z.test
, that isInterpretation: The true mean of all SASE scores in the school year 2013-2014 from CSM is likely between 88.01327 and 92.65340 (95% CI). And the true mean of all SASE scores for the said college and school year is likely between 87.28425 and 93.38241 (99% CI).
Aside from the confidence interval, the function returns also the computed z-statistics with p-value, and as well as the point estimate of the mean. To get rid of this, one can add a suffix $conf.int to the function to extract the confidence interval only.
Example 2. The following data (341, 345, 338, 339, 340, 343, 341, 343, 341, 328, 343, 347, 337, 348, and 339) are random samples from normally distributed population. Compute and interpret the 90% confidence interval.
The appropriate test for this is t-test since the sample size is small, n < 30, and the population variance is unknown. And thus,
Interpretation: The true mean of the population of the given data above is likely between 285.5911 and 356.1423 (90% CI).
Often in textbooks, however, we are presented with summary statistics of the data like the next example below from Simplified Biostatistics by Abubakar S. Asaad.
Example 3. The biostatistician took a random sample of 49 patients from a list of all patients ever admitted to the hospital within a three-month period and the number of drugs prescribed per admission was determined for each. The average drug per case was found to be 7.5 with standard deviation of 2.5. Calculate and interpret the 95% confidence interval for true mean of all the patients ever admitted to the hospital.
In this example, no dataset is given, but we have the computed mean = 7.5 of this dataset, standard deviation = 2.5, and sample size = 49. Thus, to compute for the interval estimate of the population mean in R, we use the
zsum.test
Interpretation: The true mean of all the patients ever admitted to the hospital is likely between 6.800013 and 8.199987 (95% CI).
The
tsum.test
function is used in situation like in Example 3, but this time the population variance should be unknown and the sample size should be less than 30.Reference
Asaad, Abubakar S. (2011). Simplified Biostatistics. Manila: Rex Book Store, Inc.To leave a comment for the author, please follow the link and comment on their blog: Analysis with Programming.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.