Calculate Confidence Intervals in R

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recommended to read most recent job openings and UpToDate tutorials from finnstats

Calculate Confidence Intervals in R, A confidence interval is a set of values that, with a high degree of certainty, are likely to include a population parameter.

Confidence intervals can be found all over statistics. They provide an interval likely to include the true population parameter we’re trying to estimate, allowing us to express estimated values from sample data with some confidence.

Depending on the situation, there are numerous methods for calculating them.

The following formula is used to compute it:

Confidence Interval = (point estimate)+/-(critical value)*(standard error)

This formula produces an interval with a lower and upper bound that is likely to contain a population parameter with a specified level of confidence.

Remove rows that contain all NA or certain columns in R? »

Confidence Interval  = [lower bound, upper bound]

Calculate Confidence Intervals in R

This article will show you how to construct the confidence intervals in R:

Approach 1. Confidence Interval for a Mean

Approach 2. Confidence Interval for a Difference in Means

Approach 3. Confidence Interval for a Proportion

Approach 4. Confidence Interval for a Difference in Proportions

Approach 1: Confidence Interval for a Mean

To compute a confidence interval for a mean, we use the following formula:

Remove rows that contain all NA or certain columns in R? »

Confidence Interval = x+/-tn-1, 1-α/2*(s/√n)

where:

x: sample mean

t: the t-critical value

s: sample standard deviation

n: sample size

Let’s look at an example: assume we took a random sample of data and recorded the following,

Sample size n = 30

Sample mean weight x = 200

Sample standard deviation s = 12

The code below demonstrates how to compute a 95% confidence interval for the true population mean weight of the above data.

n <- 30
xbar <- 200
s <- 12

Let’s calculate the margin of error

margin <- qt(0.975,df=n-1)*s/sqrt(n)

We can now determine the lower and upper confidence interval boundaries.

lowerinterval <- xbar - margin
lowerinterval 
[1] 195.5191
upperinterval <- xbar + margin
upperinterval 
[1] 204.4809

The genuine population mean weight of data has a 95% confidence interval of [195.5191, 204.4809].

Stringr in r 10 data manipulation Tips and Tricks »

Approach 2: Confidence Interval for a Difference in Means

To generate a confidence interval for a discrepancy in population means, use the formula below.

Confidence interval = (x1–x2)+/-t*√((sp2/n1)+(sp2/n2))

where:

x1, x2: sample 1 mean, sample 2 mean

t: the t-critical value based on the confidence level and (n1+n2-2) degrees of freedom

sp2: pooled variance, calculated as ((n1-1)s12 + (n2-1)s22) / (n1+n2-2)

t: the t-critical value

n1, n2: sample 1 size, sample 2 size

Let’s say we wanted to evaluate the difference in mean weight between two different species, so we went out and randomly selected 20 samples from each population.

What are the uses of Index Numbers? » Top 5 Uses»

Group 1 available data

x1 = 250

s1 = 13

n1 = 20

Group 2 available data

x2 = 280

s2 = 11.9

n2 = 20

The code below demonstrates how to compute a 95% confidence interval for the genuine difference in population means.

n1 <- 20
xbar1 <- 250
s1 <- 13
n2 <- 20
xbar2 <- 280
s2 <- 11.9

Now we need to calculate the pooled variance of the above data.

How to Calculate Jaccard Similarity in R »

sp = ((n1-1)*s1^2+(n2-1)*s2^2)/(n1+n2-2)
sp
155.305

Now it’s ready to calculate the margin of error

margin <- qt(0.975,df=n1+n2-1)*sqrt(sp/n1 + sp/n2)
margin
7.971173

Finally, calculate lower and upper bounds of the confidence interval

lowerinterval <- (xbar1-xbar2) - margin
lowerinterval
-37.97117
upperinterval <- (xbar1-xbar2) + margin
upperinterval
-22.02883

The genuine difference in population means has a 95% confidence interval of [-37.97117, -22.02883].

Approach 3: Confidence Interval for a Proportion

To compute a confidence interval for a proportion, we use the following formula.

Confidence Interval = p  +/-  z*(√p(1-p) / n)

where:

p: sample proportion

z: the chosen z-value

n: sample size

Let’s use an example: imagine we wish to estimate the percentage of citizens in a county who support a particular bill. We pick 500 residents at random and ask them about their opinions on the policy.

Linear Discriminant Analysis in R » LDA Prediction »

The following are the outcomes:

Sample size n = 500

Proportion in support of bill p = 0.62

The following code demonstrates how to construct a 95% confidence interval for the true proportion of county residents who support this bill.

n <- 500
p <- 0.62

First, calculate the margin of error

margin <- qnorm(0.975)*sqrt(p*(1-p)/n)
margin
0.04254522

We now calculate the lower and upper confidence interval boundaries.

lowerinterval <- p - margin
lowerinterval
[1] 0.5774548
upperinterval <- p + margin
upperinterval
[1] 0.6625452

[0.5774548, 0.6625452] is the 95 percent confidence interval for the genuine proportion of residents in the entire county who support the bill.

In otherwise we can make use of glue as mentioned below.

Linear optimization using R » Optimal Solution »

library(glue)
n <- 500
p <- 0.62
SE <- sqrt(p * (1 - p) / n)
z_star <- qnorm(1 - (1 - 0.95) / 2)
ME <- z_star * SE
glue("({p - ME}, {p + ME})")
(0.577454784096081, 0.662545215903919)

Approach 4: Confidence Interval for a Difference in Proportions

To construct a confidence interval for a difference in proportions, we use the following formula:

Confidence interval = (p1–p2)  +/-  z*√(p1(1-p1)/n1 + p2(1-p2)/n2)

where:

p1, p2: sample 1 proportion, sample 2 proportion

z: the z-critical value based on the confidence level

n1, n2: sample 1 size, sample 2 size

Let’s say we want to compare the proportion of citizens in county A who support a given bill to the proportion in county B who support the same bill. The following is a summary of the data for each sample:

Group 1 data,

n1 = 500

p1 = 0.62 #i.e. 62 out of 500 residents support the bill

Group 2 data,

n2 = 500

p2 = 0.38 #i.e. 38 out of 500 residents support the bill

The following code demonstrates how to construct a 95% confidence interval for the genuine difference in support for the bill between the counties:

KNN Algorithm Machine Learning » Classification & Regression »

n1 <- 500
p1 <- .62
n2 <- 500
p2 <- .38

Now we can calculate the margin of error

margin <- qnorm(0.975)*sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
margin
[1] 0.06016802

It’s now time to determine the lower and upper confidence interval boundaries.

lowerinterval <- (p1-p2) - margin
lowerinterval
[1] 0.6625452
upperinterval <- (p1-p2) + margin
upperinterval
[1] 0.06016802

[0.6625452, 0.06016802] is the 95 percent confidence interval for the genuine difference in the proportion of residents who approve the bill between the counties.

Conclusion

Now we know how to calculate confidence intervals in R. Larger confidence intervals increase the likelihood of catching the genuine percentage from the sample proportion, giving you more confidence that you know what it is.

Draw a trend line using ggplot-Quick Guide »

The post Calculate Confidence Intervals in R appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Methods – finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)