Easier confidence interval estimation with matrices and similar arrays in R

[This article was first published on R in the Antipodes, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When dealing with survey data in particular, social scientists are often wanting to produce proportions from the data, and associated confidence intervals. The prop.test command in R can be used to generate the desired results. When dealing with small tables, such as 2×2, prop.test is easily applied to the output of an xtabs command, e.g.
test <- xtabs(~low+smoke, data=birthwt)
#get 95% CI estimate of the proportion of low birth weight babies who had mothers who smoked during pregnancy
prop.test(test[4], test[2]+test[4], 0.05)

We can see that the xtab output can be referenced in the prop.test command by putting in the cell reference information for the data. The cell references are column-by-column, so go down each row in the first column, before moving onto the second column. We can also see that this type of prop.test command is quite short when the xtabs output is relatively small.

Issues arise when there are multiple rows or columns in the xtabs output. For example, if we wanted to examine number of physician’s visits in the first trimester by mother’s age:
test2 <- xtabs(~age+ftv, data=birthwt) 

Now if we want to know the proportion and associated CIs for each age of the mother, for each number of physician visits, there are 24 cells to sum for each physician visit count. This is way too long to enter into our prop.test; while prop.test can handle summing all these numbers, the R syntax will be hard to understand.

This is where colSums is a handy command. The column sums for test2 are simply calculated using:

 This command can be easily used in prop.test. For example, to get the proportion of mothers who saw no physical, who were 19 years of age, the prop.test command is:
prop.test(test2[6], colSums(test2)[1], 0.05)

Because the count we want is cell number 6 in test2, and the colSum we want is for the first column. We can instruct R to conduct the colSums test within prop.test, or alternatively we could assign the colSums result to an object, and then reference the cell in the object.

To leave a comment for the author, please follow the link and comment on their blog: R in the Antipodes.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)