Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
EFAEFA explores unknown factor structures while CFA tests a predefined model. Compare both methods side by side with R code using psych and lavaan — know which to use in your dissertation.
< !--Explicit Post Thumbnail-->Key Points
- EFA and CFA are both factor analysis methods, but they serve opposite purposes: EFA discovers factor structure; CFA tests a pre-specified structure.
- The core difference lies in factor loadings — EFA lets all items load freely on all factors; CFA constrains items to load only on their pre-assigned factor.
- In R, EFA uses the psych package (
fa()); CFA uses the lavaan package (cfa()). - CFA requires goodness-of-fit evaluation: CFI, TLI, RMSEA, and SRMR — all with established thresholds.
- Many dissertations use both in sequence — EFA on a pilot sample, CFA on an independent main sample.
EFA vs CFA: Quick Comparison
| Criterion | EFA (Exploratory) | CFA (Confirmatory) |
|---|---|---|
| Purpose | Discover unknown factor structure | Test a pre-specified factor structure |
| Theory required? | No — data-driven | Yes — theory-driven |
| Number of factors | Determined from data (parallel analysis) | Specified by researcher in advance |
| Factor loadings | All items load freely on all factors | Items constrained to pre-assigned factors |
| Factor rotation | Required (oblimin / varimax) | Not applicable |
| Model fit indices | Not evaluated | CFI, TLI, RMSEA, SRMR, χ² |
| R package | psych — fa() |
lavaan — cfa() |
| Research stage | Early / scale development | Later / scale validation, SEM |
| When to use in dissertation | New or adapted questionnaire, weak prior theory | Established scale, strong prior theory, SEM prep |
EFA vs CFA: What Is the Difference Between Exploratory and Confirmatory Factor Analysis?
EFA and CFA are both forms of factor analysis — statistical methods that model the relationships between observed variables (e.g., survey items) and unobserved latent variables called factors. They are not competing methods; they serve different phases of measurement research.
EFA (Exploratory Factor Analysis) lets the data reveal its own factor structure when you have no strong prior theory.
CFA (Confirmatory Factor Analysis) tests whether your observed data fit a structure you have already specified based on theory or prior EFA results.
Factor Anaysis
Dissertation quick-pick rule:
Using a new or adapted questionnaire with no established factor structure? → Start with EFA.
Replicating an established scale (Big Five, JSS, UTAUT) in a new sample? → Use CFA directly.
Both in the same study? → EFA on the pilot sample; CFA on the main independent sample.
What Is Exploratory Factor Analysis (EFA)?
Exploratory Factor Analysis (EFA) is a data-driven method that identifies the number and nature of latent factors underlying a set of observed variables, without imposing any prior constraints on which items load on which factors. EFA is theory-generating — it reveals patterns in your data that can later be formalised into a testable model for CFA.
Example: You design a 20-item questionnaire to measure academic motivation. You have no prior theory about how many sub-dimensions exist. EFA will cluster those 20 items into 3–5 factors (e.g., intrinsic motivation, extrinsic motivation, self-regulation) based purely on their intercorrelations — and the factor structure emerges from the data, not from your assumptions.
When to Use EFA
• You are developing a new scale or questionnaire from scratch.
• You are adapting an existing scale to a new language, culture, or context.
• The literature shows limited, mixed, or no prior evidence about the factor structure.
• You want to reduce many variables into a smaller set of interpretable dimensions.
• You are in the early, exploratory phase of your measurement validation workflow.
EFA Assumptions and Sample Size Requirements
Before running EFA, your data must meet several requirements. Items should be measured on interval or ordinal scales — Likert scales are appropriate. Check data factorability using the Kaiser-Meyer-Olkin (KMO) test (value > 0.60 required; > 0.80 is good) and Bartlett’s test of sphericity (p < 0.05 required). For sample size, the recommended minimum is 100 participants, but most methodologists advise at least 5–10 cases per item. A 20-item scale needs at least 200 respondents for stable EFA results.
How to Run EFA in R Using the psych Package
The psych package provides the most complete EFA workflow in R. The example below uses the built-in bfi dataset (25 Big Five personality items from the psych package itself):
# Step 1: Install and load the psych package
install.packages("psych")
library(psych)
# Step 2: Load data — bfi = Big Five Inventory (25 personality items)
data(bfi)
bfi_items <- bfi[, 1:25] # Select only the 25 personality items
# Step 3: Test factorability before running EFA
KMO(bfi_items) # KMO > 0.60 required
cortest.bartlett(bfi_items, n = nrow(bfi_items)) # p < 0.05 required
# Step 4: Determine the number of factors via parallel analysis
fa.parallel(bfi_items, fm = "ml", fa = "fa")
# Step 5: Run EFA with 5 factors and oblimin (oblique) rotation
efa_model <- fa(bfi_items,
nfactors = 5,
rotate = "oblimin", # oblique — factors are allowed to correlate
fm = "ml") # maximum likelihood estimation
# Step 6: Inspect factor loadings (show only loadings > 0.30)
print(efa_model$loadings, cutoff = 0.3)
# Step 7: View factor structure diagram
fa.diagram(efa_model)
Rotation choice:
Use rotate = "oblimin" (oblique) as your default — psychological and social science constructs are almost always correlated. Only use rotate = "varimax" (orthogonal) if you have a strong theoretical reason to assume completely independent factors.
What Is Confirmatory Factor Analysis (CFA)?
Confirmatory Factor Analysis (CFA) is a theory-testing method. You specify a measurement model in advance — exactly how many factors exist, which items load on which factors, and whether factors are correlated — then evaluate how well your observed data fit that model using goodness-of-fit indices. CFA is part of the Structural Equation Modelling (SEM) framework and is the standard method for establishing construct validity in dissertation research.
Example: Prior research and your literature review both support a 2-factor model of academic motivation: intrinsic and extrinsic motivation. You specify this 2-factor CFA model with 10 items assigned in advance, fit it to your data, and evaluate whether CFI > 0.95, RMSEA < 0.06, and SRMR < 0.08. If fit is acceptable, you have confirmed the structure and can proceed to SEM.
When to Use CFA
• You are validating or replicating an established measurement scale in a new sample.
• You have strong theoretical or prior empirical support for a specific factor structure.
• You need to assess construct validity (convergent and discriminant validity).
• You are preparing data for Structural Equation Modelling (SEM) — CFA is a mandatory prerequisite.
• You want to compare competing theoretical models (e.g., one-factor vs two-factor structure).
• You are confirming the factor structure found in an earlier EFA.
How to Run CFA in R Using the lavaan Package
The lavaan package is the standard CFA and SEM tool in R. Here is a full working example using two factors from the bfi dataset:
# Step 1: Install and load lavaan
install.packages("lavaan")
library(lavaan)
# Step 2: Define your measurement model using lavaan syntax
# Each line: factor_name =~ item1 + item2 + item3 ...
cfa_model <- '
agreeableness =~ A1 + A2 + A3 + A4 + A5
conscientiousness =~ C1 + C2 + C3 + C4 + C5
'
# Step 3: Fit the model to your data
fit <- cfa(cfa_model,
data = bfi,
std.lv = TRUE) # standardise latent variables
# Step 4: Full model summary with standardised loadings and fit indices
summary(fit, fit.measures = TRUE, standardized = TRUE)
# Step 5: Extract specific fit indices for reporting
fitMeasures(fit, c("cfi", "tli", "rmsea", "rmsea.ci.lower",
"rmsea.ci.upper", "srmr", "chisq", "df", "pvalue"))
# Step 6: Inspect modification indices if fit is poor
modindices(fit, sort. = TRUE, maximum.number = 10)
EFA vs CFA: Full Head-to-Head Comparison
| Feature | EFA (Exploratory Factor Analysis) | CFA (Confirmatory Factor Analysis) |
|---|---|---|
| Goal | Discover and generate factor structure | Test and confirm pre-specified structure |
| Type | Theory-generating | Theory-testing |
| Prior theory required | No | Yes |
| Number of factors | Data-driven (parallel analysis, scree plot) | Specified by researcher before analysis |
| Factor loadings | All items load freely on all factors | Pre-specified; cross-loadings fixed to zero |
| Factor correlations | Depends on rotation method chosen | Specified by researcher (correlated or orthogonal) |
| Estimation method | ML, PAF (principal axis factoring), ULS | ML, WLS, WLSMV (for ordinal/Likert data) |
| Rotation | Required — oblimin (oblique) or varimax (orthogonal) | Not applicable |
| Model fit evaluation | Not applicable | CFI, TLI, RMSEA, SRMR, χ²/df |
| R package | psych — fa() |
lavaan — cfa() |
| SPSS equivalent | Analyze → Dimension Reduction → Factor | AMOS (or lavaan in R) |
| Research application | Scale development, instrument design, pilot studies | Scale validation, SEM, multi-group analysis |
| Typical research stage | Early-stage / exploratory | Later-stage / confirmatory / validation |
CFA Model Fit Indices: RMSEA, CFI, TLI, and SRMR Explained
When you run CFA, evaluating model fit is not optional — it is the core output. A CFA result without fit indices is unpublishable. The table below shows every index you need to report, what it measures, and the widely accepted thresholds. Report at least three; never rely on χ² alone (it is highly sensitive to sample size).
| Fit Index | What it measures | Acceptable threshold | Good fit |
|---|---|---|---|
| CFI — Comparative Fit Index | How much better your model fits than a null (no-factor) model | > 0.90 | > 0.95 |
| TLI — Tucker-Lewis Index | Like CFI but penalises model complexity | > 0.90 | > 0.95 |
| RMSEA — Root Mean Square Error of Approximation | Average error per degree of freedom — lower is better | < 0.08 | < 0.05 |
| SRMR — Standardised Root Mean Square Residual | Average difference between observed and predicted correlations | < 0.08 | < 0.05 |
| χ² / df ratio | Overall model misfit (avoid as sole criterion — n-sensitive) | < 3.0 | < 2.0 |
APA reporting template for dissertations:
“The two-factor CFA model demonstrated acceptable fit: χ²(34) = 67.2, p < .001, CFI = .96, TLI = .95, RMSEA = .047 [90% CI: .029–.064], SRMR = .051. All standardised factor loadings were statistically significant and exceeded .50 (range: .53–.78).”
Can You Use Both EFA and CFA in the Same Dissertation?
Yes — and in many quantitative dissertations, using both is the most rigorous approach. The critical rule: you must use independent datasets. Using the same data for EFA and then CFA is a methodological error that peer reviewers and examiners will flag, because a CFA model derived from EFA results will always fit the same data well — that is circularity, not validation.
The correct sequential approach, step by step:
- Collect two independent datasets. Option A: run a pilot study (n ≥ 100–150) for EFA, then collect a main sample (n ≥ 200–300) for CFA. Option B: collect one large dataset and split it randomly 50/50.
- Run EFA on Sample 1 using the psych package in R. Report KMO, Bartlett’s test, parallel analysis output, factor loadings, communalities, and percentage variance explained.
- Specify the CFA model based on the EFA factor structure. Assign each item to the factor it loaded most strongly on. Drop items with cross-loadings above 0.30 on two or more factors.
- Run CFA on Sample 2 using lavaan. Fit the model, evaluate fit indices, and inspect modification indices if fit is inadequate.
- Report both analyses in your methodology chapter, clearly stating which sample was used for which analysis and why the sequential approach was chosen.
How to Choose Between EFA and CFA: Decision Rules
The choice between EFA and CFA depends on your research question, the state of the literature, and the purpose of your factor analysis. These rules cover the most common dissertation scenarios:
- No prior theory about factor structure → EFA
- Established, well-cited factor structure from prior studies → CFA
- Adapting a scale to a new language, culture, or population → EFA first, then CFA
- Building toward Structural Equation Modelling → CFA mandatory
- Developing a new psychometric scale from scratch → EFA first, CFA to validate
- Comparing two competing theoretical models → CFA (use model comparison with likelihood ratio test)
- Mixed or contradictory literature on factor structure → EFA
- Testing construct validity (convergent + discriminant) → CFA
Examples of EFA and CFA in Research
Example 1: EFA of Personality Traits — The Big Five
The most influential application of EFA is the development of the Big Five personality model. Researchers applied EFA to hundreds of personality adjectives across multiple independent samples. Without any prior constraint on which traits should cluster together, EFA consistently revealed five factors: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. This is EFA at its best — no prior theory constrained the analysis, and the five-factor structure replicated across cultures and languages.
To replicate this in R, run EFA with 5 factors and oblimin rotation on the bfi dataset in the psych package (code shown above). Each of the five resulting factors maps cleanly onto one of the Big Five dimensions, with factor loadings above 0.40 for the primary items.
Example 2: CFA of Job Satisfaction — The JSS
Spector’s (1985) Job Satisfaction Survey (JSS) proposes a 9-factor model covering pay, promotion, supervision, fringe benefits, contingent rewards, operating procedures, co-workers, nature of work, and communication. A researcher validating the JSS in a healthcare sample would use CFA: specify all 36 items loading on their designated factors, fit the model with lavaan, and evaluate whether CFI > 0.95, RMSEA < 0.06, and SRMR < 0.08.
If fit is poor (e.g., RMSEA > 0.08), consult modification indices and consider whether any theoretically justifiable correlated residuals between items within the same facet would improve fit. Only modify parameters where there is both statistical and substantive justification. Need help interpreting your CFA output or writing up your p-values and fit indices? Message Dr. Zubair on WhatsApp.
Common EFA and CFA Mistakes to Avoid
- Using EFA and CFA on the same sample — the most common dissertation error. Always use independent samples.
- Choosing the number of EFA factors by eigenvalue > 1 rule alone — this systematically over-extracts factors. Use parallel analysis (
fa.parallel()) instead. - Using orthogonal rotation (varimax) by default — most psychological and social science constructs are correlated. Use oblimin rotation unless theory dictates independence.
- Reporting only χ² for CFA — χ² is significant with any n > 200. Always report CFI, TLI, RMSEA, and SRMR alongside it.
- Keeping cross-loading items in the CFA model — items that loaded > 0.30 on two or more factors in EFA should be dropped before CFA specification.
- Fewer than three items per factor — two-item factors are under-identified in CFA. Each factor needs at least 3 indicators.
- Confusing EFA with PCA — Principal Component Analysis (PCA) is a data reduction method, not a factor analysis technique. See our guide on Factor Analysis vs PCA for the full distinction.
EFA and CFA in Thesis and Dissertation: Reporting Requirements
Both analyses appear in the methodology chapter under “Measurement Validation” or “Scale Development.” Here is exactly what your committee and journal reviewers expect:
- For EFA: Report KMO value, Bartlett’s test (χ², df, p), number of factors extracted, method for determining factor number (parallel analysis recommended), extraction method (ML recommended), rotation method, eigenvalues for retained factors, percentage variance explained by each factor and total, and a complete factor loading matrix with items bolded if > 0.30.
- For CFA: Report the measurement model diagram (path diagram), sample size, estimation method (ML for continuous/normal data; WLSMV for ordinal/Likert), and fit indices: χ²(df), p-value, CFI, TLI, RMSEA [90% CI], SRMR. Report all standardised factor loadings and their significance. State whether modification indices were consulted and, if the model was modified, provide both theoretical and statistical justification.
- For both in the same study: Clearly label Sample 1 (EFA) and Sample 2 (CFA) in your methods section. Justify the sequential strategy. Provide descriptive statistics for both samples.
For related analyses your dissertation may also require, see our guides on EFA in R with psych, PCA in R, Factor Analysis vs PCA, and normality testing with Shapiro-Wilk before running your analyses.
Conclusion
The difference between EFA and CFA comes down to one question: do you know the factor structure, or are you trying to find it? EFA discovers structure from data when theory is absent or weak. CFA confirms a structure you have already specified when theory is strong or prior EFA results exist. In R, the psych package handles EFA and the lavaan package handles CFA — both are free, well-documented, and the current standard in academic research.
For PhD and Master’s dissertation researchers: the most defensible methodology for a new measurement instrument is EFA on a pilot sample followed by CFA on an independent main sample. This sequential approach demonstrates both exploratory rigour and confirmatory validity to examiners and reviewers.
If you need expert help with your EFA or CFA analysis — including running the analysis in R or SPSS, interpreting fit indices, writing up results in APA format, or preparing your methodology chapter — contact Dr. Zubair Goraya on WhatsApp or book a session via the link below.
Get Help With Your Factor Analysis → WhatsAppBook a Consulting Session
Frequently Asked Questions
EFA (Exploratory Factor Analysis) is used when you have no prior theory about the factor structure of your data — it discovers the structure from the data itself. CFA (Confirmatory Factor Analysis) is used when you already have a hypothesised structure and want to test whether your data fit it. EFA is theory-generating; CFA is theory-testing. In R, EFA uses the psych package (fa() function) and CFA uses the lavaan package (cfa() function).
EFA (Exploratory Factor Analysis) is a statistical method used to identify the underlying latent factor structure of a set of observed variables without imposing any prior constraints. EFA determines how many factors exist in the data and which items cluster onto which factors. It is widely used in scale development, psychometrics, and any research context where the structure of a construct has not yet been established. In R, EFA is performed using the fa() function in the psych package.
Confirmatory Factor Analysis (CFA) is a theory-testing method where the researcher specifies in advance how many factors exist, which items load on which factors, and whether factors are correlated. CFA evaluates how well this pre-specified model fits the observed data using goodness-of-fit indices: CFI (> 0.95), TLI (> 0.95), RMSEA (< 0.06), and SRMR (< 0.08). In R, CFA is performed using the cfa() function in the lavaan package.
Use EFA when you are developing a new measurement instrument, adapting an existing scale to a new population, or when the literature provides limited or contradictory evidence about factor structure. Use CFA when you are validating an established scale, testing a theoretically supported structure, or preparing data for Structural Equation Modelling (SEM). Many dissertations use both: EFA on a pilot sample to identify the structure, then CFA on an independent main sample to confirm it.
Yes, but you must use independent datasets for each analysis. The standard approach is to collect a pilot sample (n ≥ 100) for EFA and a separate main sample (n ≥ 200) for CFA. Alternatively, randomly split one large dataset 50/50. Running EFA and CFA on the same data is a methodological error — the CFA will always fit well on data it was derived from, which is circularity, not validation.
For EFA in R, the psych package is the standard tool — use the fa() function with nfactors, rotate (“oblimin” or “varimax”), and fm (“ml” for maximum likelihood) arguments. For CFA in R, the lavaan package is the industry standard — define your measurement model using lavaan syntax, fit it with cfa(), and evaluate fit using fitMeasures(). Install both via install.packages(“psych”) and install.packages(“lavaan”) from CRAN.
Widely accepted CFA fit thresholds are: CFI > 0.95 (acceptable: > 0.90), TLI > 0.95 (acceptable: > 0.90), RMSEA < 0.06 (acceptable: < 0.08), and SRMR < 0.06 (acceptable: < 0.08). Never use chi-square as the sole criterion — it is statistically significant with sample sizes above 200 even when fit is acceptable. Always report at least three fit indices in your dissertation, including RMSEA with its 90% confidence interval.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
