Finding Correlations in Data with Uncertainty: Classical Solution
[This article was first published on Exegetic Analytics » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Following up on my previous post as a result of an excellent suggestion from Andrej Spiess. The data are indeed very heteroscedastic! Andrej suggested that an alternative way to attack this problem would be to use weighted correlation with weights being the inverse of the measurement variance.
Let’s look at the synthetic data first.
> library(weights) > > wtd.cor(synthetic$mu.x, synthetic$mu.y, weight = 1 / synthetic$sigma.y**2) correlation std.err t.value p.value V1.V1 0.1945633 0.09908485 1.963603 0.05240988
This is in excellent agreement with the bootstrap results. Now the original experimental data.
> wtd.cor(original$mu.x, original$mu.y, weight = 1 / original$sigma.y**2) correlation std.err t.value p.value V1.V1 0.2407686 0.04606181 5.227076 2.656016e-07
Here the agreement with the bootstrap result is not as good. I’m not quite sure why, but suspect that it might have something to do with the fact that the original data are quite severely skewed so that assumptions about normality would probably be voilated.
To leave a comment for the author, please follow the link and comment on their blog: Exegetic Analytics » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.