# Making Friends with Multicollinearity

**Engaging Market Research**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Over a hundred years ago, Charles Spearman noted that performance scores from different cognitive tasks were highly correlated. Wikipedia provides a comprehensive review and a number of good examples of such correlation matrices. When looking at the recurring pattern of positive correlations among almost all cognitive tasks, Spearman saw the presence of a single latent ability dimension, which he called “g.” Spearman was not interested in running regression analyses with cognitive tasks as separate predictors. He was not concerned with the individual contribution of each cognitive task controlling for all the other cognitive tasks. He did not see multicollinearity as a problem but as an indication that each predictor was a manifestation of the same underlying latent trait. Spearman was inventing factor analysis and cared more about the latent trait than the manifest variables. Multicollinearity was a friend because it allowed Spearman to “see” behind the observed variables.

Item response theory follows Spearman’s lead. Test scores on cognitive tasks are replaced with individual items, but the focus remains on the latent trait responsible for the item score. In fact, items that do not measure the same latent trait in the same way across respondents will be removed (differential item functioning). In an earlier post, I attempted an intuitive introduction to item response theory. I plan to return to this topic in future posts. The positive manifold is a common structure underlying rating data (e.g., halo effects). My goal is to examine in some depth the cognitive and affective processes that are used when answering rating items and to show how the positive manifold results from such processes.

__Footnote__: Many readers are likely to discover that most discussions of positive manifolds might be just slightly out of their reach. However, Cosma Shalizi has published a post on his blog (called “g, a Statistical Myth”) that is both comprehensive and not unnecessarily complicated. If you have read the link to Wikipedia, you will know that there are three theories of g: mental energy, sampling theory, and mutualism. Shalizi summarizes all three with both pictures and lots of references to other work. As I believe that the Borsboom links are so important, I will offer two of my own: one to Borsboom papers and the other to the PsychoSystems Project. All of these readings move us from the statistical model where latent variables are “convenient fictions” to the substantive world where latent variables can be theoretically tied to real outcomes that can be seen and felt.

**leave a comment**for the author, please follow the link and comment on their blog:

**Engaging Market Research**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.