Easier confidence interval estimation with matrices and similar arrays in R

May 15, 2013
By

(This article was first published on R in the Antipodes, and kindly contributed to R-bloggers)

When dealing with survey data in particular, social scientists are often wanting to produce proportions from the data, and associated confidence intervals. The prop.test command in R can be used to generate the desired results. When dealing with small tables, such as 2×2, prop.test is easily applied to the output of an xtabs command, e.g.
load(MASS)
test <- xtabs(~low+smoke, data=birthwt)
test
#get 95% CI estimate of the proportion of low birth weight babies who had mothers who smoked during pregnancy
prop.test(test[4], test[2]+test[4], 0.05)

We can see that the xtab output can be referenced in the prop.test command by putting in the cell reference information for the data. The cell references are column-by-column, so go down each row in the first column, before moving onto the second column. We can also see that this type of prop.test command is quite short when the xtabs output is relatively small.

Issues arise when there are multiple rows or columns in the xtabs output. For example, if we wanted to examine number of physician’s visits in the first trimester by mother’s age:
test2 <- xtabs(~age+ftv, data=birthwt) 
test2 

Now if we want to know the proportion and associated CIs for each age of the mother, for each number of physician visits, there are 24 cells to sum for each physician visit count. This is way too long to enter into our prop.test; while prop.test can handle summing all these numbers, the R syntax will be hard to understand.

This is where colSums is a handy command. The column sums for test2 are simply calculated using:
colSums(test2)

 This command can be easily used in prop.test. For example, to get the proportion of mothers who saw no physical, who were 19 years of age, the prop.test command is:
prop.test(test2[6], colSums(test2)[1], 0.05)

Because the count we want is cell number 6 in test2, and the colSum we want is for the first column. We can instruct R to conduct the colSums test within prop.test, or alternatively we could assign the colSums result to an object, and then reference the cell in the object.
 
 

To leave a comment for the author, please follow the link and comment on their blog: R in the Antipodes.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)