The psych R package includes a data set called “bfi” with self-report ratings on 25 personality items along a 6-point agreement scale. All the details are provided in the documentation accompanying the package. My focus is how to represent the correlations among these ratings: factor analysis or network graphics?
Let’s start with the correlation network map produced by the R package qgraph. As always, all the R code can be found at the end of this post.
First, we need to discover the underlying pattern, so we will begin by looking for nodes with the highest correlations and thus interconnected with the thickest lines. Red lines indicate negative correlations (e.g., those who claim that they are “indifferent to others” are unlikely to tell us that they “inquire about others” or “comfort others”). Positive correlations are shown in green (e.g., several nodes toward the bottom of the network suggest that those who report “mood swings” and “panic easily” also said that they are easy to anger and irritate). The node “reflect on things” seems to be misplaced, but it is not. The thin red and green lines suggest that it has uniformly low correlations with all the other items, which explain why it is positioned at the periphery but closest to the other four items with which it is the most correlated.
Using this approach, we can identify several regions that are placed near each other because of their interconnections. For instance, the personal problems mentioned previously and located toward the bottom of the graph are separated from but linked to the measures of introversion (“difficult approach others” and “don’t talk”), which in turn have strong negative correlations with extroversion (“makes friends”). As we continue up the graph on the left side, we find an active openness to others that becomes take charge and conscientious. If we continue back down the right side, respondents note what might be called work-related problems. Now, we have our story, and we can see the two-dimensional structure defining the correlation network: internal vs. external and in-control vs. out-of-control.
Next, we can compare this network representation with the more traditional factor model. Why do we observe correlations among observed variables? Correlations are the results of latent variables. We see this in the factor model diagram created using the same data. For example, individuals possess some degree of neuroticism (labeled RC2), therefore the five personal problem items are intercorrelated. The path coefficient associated with each arrow indicates the correlation between the factor and the observed variable, and the product of the path coefficients for any two observed variables is our estimate of the correlation between those two observed variables.
One should recognize that the two diagrams seek to account for the same correlation matrix. The factor model does so by postulating the presence of unseen forces or latent variables. However, we never observe neuroticism, and we understand that all we have is a pattern of higher correlations among those five self-reports. Without compelling evidence for the independent existence of such a latent variable, we might try to avoid making the reification fallacy and look for a different explanation.
The network model provides an alternative account. Perhaps the best overview of this approach can be found at the PsychoSystems Project. From a network perspective, correlations are observed because the nodes mutually interact. This is not a directed graph attempting to separate cause and effect. It is not a causal model. Perhaps in the beginning, there was a causal connection with one node occurring first and impacting the other nodes. But over time, these nodes have come to mutually support one another so that the unique effects of the self-report ratings can no longer be untangled.
Which of these two representations is better? If the observed variables are true reflections of an underlying trait that can be independently established, then the factor model offers a convenient hierarchical model. We think that we are observed five different things, but in fact, we are measuring five different manifestation of one underlying construct. On the other hand, a network of mutually supportive observations cannot be represented using a factor model. There are no factors, and asserting so ends the discussion prematurely. What are the relationships among the separate nodes? How can one intervene to break the cycle? Are there multiple leverage points? In previous posts, I showed how much can be gained using a network visualization of a key driver analysis and how much can be lost relying solely on an input-output regression model. Besides, why would you not generate the map when, as shown below, R makes it so easy to do?
R code to create the two plots:
"Indifferent of others",
"Inquire about others",
"Make people at ease",
"Exacting in my work",
"Do by plan",
"Difficult approach others",
"Know how to captivate people",
"Full of ideas",
"Avoid difficult reading",
"Carry conversation higher",
"Reflect on things",
"Not probe deeply"
fa.diagram(principal(ratings, nfactors=5), main="")
qgraph(cor(ratings, use="pairwise"), layout="spring",