EFA vs CFA: Key Differences Between Exploratory & Confirmatory Factor Analysis (R)

Q: What is the difference between EFA and CFA?

EFA (Exploratory Factor Analysis) discovers factor structure from data without prior constraints. CFA (Confirmatory Factor Analysis) tests whether data fit a pre-specified theoretical structure. EFA is theory-generating; CFA is theory-testing. In R, EFA uses the psych package (fa()) and CFA uses lavaan (cfa()).

Q: What is EFA in research?

EFA (Exploratory Factor Analysis) identifies the underlying latent factor structure of observed variables without prior constraints. It determines how many factors exist and which items cluster on which factors. Used in scale development and early-stage research. In R, performed with fa() in the psych package.

Q: What is CFA (confirmatory factor analysis)?

Confirmatory Factor Analysis (CFA) tests whether observed data fit a pre-specified measurement model. The researcher specifies in advance how many factors exist, which items load on which factors, and whether factors are correlated. Model fit is evaluated using CFI (>0.95), TLI (>0.95), RMSEA (<0.06), and SRMR (<0.08). In R, performed using the cfa() function in the lavaan package.

Q: When should I use EFA vs CFA in my dissertation?

Use EFA when developing a new measurement instrument or when the literature provides limited evidence about factor structure. Use CFA when validating an established scale, testing a theory-supported structure, or preparing data for SEM. Many dissertations use both: EFA on a pilot sample, then CFA on an independent main sample.

Q: Can I use both EFA and CFA in the same study?

Yes, but you must use independent datasets. Use a pilot sample (n ≥ 100) for EFA and a separate main sample (n ≥ 200) for CFA, or split a large dataset randomly 50/50. Running both on the same data is a methodological error — the CFA will always fit data it was derived from.

Q: What R packages are used for EFA and CFA?

For EFA in R, use the psych package — specifically the fa() function with nfactors, rotate, and fm arguments. For CFA in R, use the lavaan package with the cfa() function. Install both via install.packages('psych') and install.packages('lavaan') from CRAN.

Q: What are acceptable model fit index values for CFA?

Acceptable CFA fit thresholds: CFI > 0.90 (good: > 0.95), TLI > 0.90 (good: > 0.95), RMSEA < 0.08 (good: < 0.05), SRMR < 0.08 (good: < 0.05). Never rely on chi-square alone. Report at least three fit indices including RMSEA with its 90% confidence interval.

Unknown

5 hours ago

[This article was first published on RStudioDataLab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< !--Explicit Post Snippet-->

EFA explores unknown factor structures while CFA tests a predefined model. Compare both methods side by side with R code using psych and lavaan — know which to use in your dissertation.

EFAEFA explores unknown factor structures while CFA tests a predefined model. Compare both methods side by side with R code using psych and lavaan — know which to use in your dissertation.

< !--Explicit Post Thumbnail-->

Key Points

EFA and CFA are both factor analysis methods, but they serve opposite purposes: EFA discovers factor structure; CFA tests a pre-specified structure.
The core difference lies in factor loadings — EFA lets all items load freely on all factors; CFA constrains items to load only on their pre-assigned factor.
In R, EFA uses the psych package (fa()); CFA uses the lavaan package (cfa()).
CFA requires goodness-of-fit evaluation: CFI, TLI, RMSEA, and SRMR — all with established thresholds.
Many dissertations use both in sequence — EFA on a pilot sample, CFA on an independent main sample.

EFA vs CFA: Quick Comparison

Criterion	EFA (Exploratory)	CFA (Confirmatory)
Purpose	Discover unknown factor structure	Test a pre-specified factor structure
Theory required?	No — data-driven	Yes — theory-driven
Number of factors	Determined from data (parallel analysis)	Specified by researcher in advance
Factor loadings	All items load freely on all factors	Items constrained to pre-assigned factors
Factor rotation	Required (oblimin / varimax)	Not applicable
Model fit indices	Not evaluated	CFI, TLI, RMSEA, SRMR, χ²
R package	psych — `fa()`	lavaan — `cfa()`
Research stage	Early / scale development	Later / scale validation, SEM
When to use in dissertation	New or adapted questionnaire, weak prior theory	Established scale, strong prior theory, SEM prep

< !--Table of Contents--> < details class="sp toc" open=""> < summary data-hide="Hide all" data-show="Show all">Table of Contents

EFA vs CFA: What Is the Difference Between Exploratory and Confirmatory Factor Analysis?

EFA and CFA are both forms of factor analysis — statistical methods that model the relationships between observed variables (e.g., survey items) and unobserved latent variables called factors. They are not competing methods; they serve different phases of measurement research.

EFA (Exploratory Factor Analysis) lets the data reveal its own factor structure when you have no strong prior theory.

CFA (Confirmatory Factor Analysis) tests whether your observed data fit a structure you have already specified based on theory or prior EFA results.
Factor Anaysis

Dissertation quick-pick rule:
Using a new or adapted questionnaire with no established factor structure? → Start with EFA.
Replicating an established scale (Big Five, JSS, UTAUT) in a new sample? → Use CFA directly.
Both in the same study? → EFA on the pilot sample; CFA on the main independent sample.

What Is Exploratory Factor Analysis (EFA)?

Exploratory Factor Analysis (EFA) is a data-driven method that identifies the number and nature of latent factors underlying a set of observed variables, without imposing any prior constraints on which items load on which factors. EFA is theory-generating — it reveals patterns in your data that can later be formalised into a testable model for CFA.

Example: You design a 20-item questionnaire to measure academic motivation. You have no prior theory about how many sub-dimensions exist. EFA will cluster those 20 items into 3–5 factors (e.g., intrinsic motivation, extrinsic motivation, self-regulation) based purely on their intercorrelations — and the factor structure emerges from the data, not from your assumptions.

When to Use EFA

Use EFA when:
• You are developing a new scale or questionnaire from scratch.
• You are adapting an existing scale to a new language, culture, or context.
• The literature shows limited, mixed, or no prior evidence about the factor structure.
• You want to reduce many variables into a smaller set of interpretable dimensions.
• You are in the early, exploratory phase of your measurement validation workflow.

EFA Assumptions and Sample Size Requirements

Before running EFA, your data must meet several requirements. Items should be measured on interval or ordinal scales — Likert scales are appropriate. Check data factorability using the Kaiser-Meyer-Olkin (KMO) test (value > 0.60 required; > 0.80 is good) and Bartlett’s test of sphericity (p < 0.05 required). For sample size, the recommended minimum is 100 participants, but most methodologists advise at least 5–10 cases per item. A 20-item scale needs at least 200 respondents for stable EFA results.

How to Run EFA in R Using the psych Package

The psych package provides the most complete EFA workflow in R. The example below uses the built-in bfi dataset (25 Big Five personality items from the psych package itself):

# Step 1: Install and load the psych package
install.packages("psych")
library(psych)

# Step 2: Load data — bfi = Big Five Inventory (25 personality items)
data(bfi)
bfi_items <- bfi[, 1:25]          # Select only the 25 personality items

# Step 3: Test factorability before running EFA
KMO(bfi_items)                     # KMO > 0.60 required
cortest.bartlett(bfi_items, n = nrow(bfi_items))  # p < 0.05 required

# Step 4: Determine the number of factors via parallel analysis
fa.parallel(bfi_items, fm = "ml", fa = "fa")

# Step 5: Run EFA with 5 factors and oblimin (oblique) rotation
efa_model <- fa(bfi_items,
                nfactors = 5,
                rotate    = "oblimin",  # oblique — factors are allowed to correlate
                fm        = "ml")       # maximum likelihood estimation

# Step 6: Inspect factor loadings (show only loadings > 0.30)
print(efa_model$loadings, cutoff = 0.3)

# Step 7: View factor structure diagram
fa.diagram(efa_model)

Rotation choice:
Use rotate = "oblimin" (oblique) as your default — psychological and social science constructs are almost always correlated. Only use rotate = "varimax" (orthogonal) if you have a strong theoretical reason to assume completely independent factors.

What Is Confirmatory Factor Analysis (CFA)?

Confirmatory Factor Analysis (CFA) is a theory-testing method. You specify a measurement model in advance — exactly how many factors exist, which items load on which factors, and whether factors are correlated — then evaluate how well your observed data fit that model using goodness-of-fit indices. CFA is part of the Structural Equation Modelling (SEM) framework and is the standard method for establishing construct validity in dissertation research.

Example: Prior research and your literature review both support a 2-factor model of academic motivation: intrinsic and extrinsic motivation. You specify this 2-factor CFA model with 10 items assigned in advance, fit it to your data, and evaluate whether CFI > 0.95, RMSEA < 0.06, and SRMR < 0.08. If fit is acceptable, you have confirmed the structure and can proceed to SEM.

When to Use CFA

Use CFA when:
• You are validating or replicating an established measurement scale in a new sample.
• You have strong theoretical or prior empirical support for a specific factor structure.
• You need to assess construct validity (convergent and discriminant validity).
• You are preparing data for Structural Equation Modelling (SEM) — CFA is a mandatory prerequisite.
• You want to compare competing theoretical models (e.g., one-factor vs two-factor structure).
• You are confirming the factor structure found in an earlier EFA.

How to Run CFA in R Using the lavaan Package

The lavaan package is the standard CFA and SEM tool in R. Here is a full working example using two factors from the bfi dataset:

# Step 1: Install and load lavaan
install.packages("lavaan")
library(lavaan)

# Step 2: Define your measurement model using lavaan syntax
# Each line: factor_name =~ item1 + item2 + item3 ...
cfa_model <- '
  agreeableness     =~ A1 + A2 + A3 + A4 + A5
  conscientiousness =~ C1 + C2 + C3 + C4 + C5
'

# Step 3: Fit the model to your data
fit <- cfa(cfa_model,
            data   = bfi,
            std.lv = TRUE)   # standardise latent variables

# Step 4: Full model summary with standardised loadings and fit indices
summary(fit, fit.measures = TRUE, standardized = TRUE)

# Step 5: Extract specific fit indices for reporting
fitMeasures(fit, c("cfi", "tli", "rmsea", "rmsea.ci.lower",
                   "rmsea.ci.upper", "srmr", "chisq", "df", "pvalue"))

# Step 6: Inspect modification indices if fit is poor
modindices(fit, sort. = TRUE, maximum.number = 10)

EFA vs CFA: Full Head-to-Head Comparison

Feature	EFA (Exploratory Factor Analysis)	CFA (Confirmatory Factor Analysis)
Goal	Discover and generate factor structure	Test and confirm pre-specified structure
Type	Theory-generating	Theory-testing
Prior theory required	No	Yes
Number of factors	Data-driven (parallel analysis, scree plot)	Specified by researcher before analysis
Factor loadings	All items load freely on all factors	Pre-specified; cross-loadings fixed to zero
Factor correlations	Depends on rotation method chosen	Specified by researcher (correlated or orthogonal)
Estimation method	ML, PAF (principal axis factoring), ULS	ML, WLS, WLSMV (for ordinal/Likert data)
Rotation	Required — oblimin (oblique) or varimax (orthogonal)	Not applicable
Model fit evaluation	Not applicable	CFI, TLI, RMSEA, SRMR, χ²/df
R package	psych — `fa()`	lavaan — `cfa()`
SPSS equivalent	Analyze → Dimension Reduction → Factor	AMOS (or lavaan in R)
Research application	Scale development, instrument design, pilot studies	Scale validation, SEM, multi-group analysis
Typical research stage	Early-stage / exploratory	Later-stage / confirmatory / validation

CFA Model Fit Indices: RMSEA, CFI, TLI, and SRMR Explained

When you run CFA, evaluating model fit is not optional — it is the core output. A CFA result without fit indices is unpublishable. The table below shows every index you need to report, what it measures, and the widely accepted thresholds. Report at least three; never rely on χ² alone (it is highly sensitive to sample size).

Fit Index	What it measures	Acceptable threshold	Good fit
CFI — Comparative Fit Index	How much better your model fits than a null (no-factor) model	> 0.90	> 0.95
TLI — Tucker-Lewis Index	Like CFI but penalises model complexity	> 0.90	> 0.95
RMSEA — Root Mean Square Error of Approximation	Average error per degree of freedom — lower is better	< 0.08	< 0.05
SRMR — Standardised Root Mean Square Residual	Average difference between observed and predicted correlations	< 0.08	< 0.05
χ² / df ratio	Overall model misfit (avoid as sole criterion — n-sensitive)	< 3.0	< 2.0

APA reporting template for dissertations:
“The two-factor CFA model demonstrated acceptable fit: χ²(34) = 67.2, p < .001, CFI = .96, TLI = .95, RMSEA = .047 [90% CI: .029–.064], SRMR = .051. All standardised factor loadings were statistically significant and exceeded .50 (range: .53–.78).”

Can You Use Both EFA and CFA in the Same Dissertation?

Yes — and in many quantitative dissertations, using both is the most rigorous approach. The critical rule: you must use independent datasets. Using the same data for EFA and then CFA is a methodological error that peer reviewers and examiners will flag, because a CFA model derived from EFA results will always fit the same data well — that is circularity, not validation.

Critical error to avoid: Never run EFA and CFA on the same dataset. The CFA model will fit the data it was built from — this proves nothing. Always use separate, independent samples for each phase.

The correct sequential approach, step by step:

Collect two independent datasets. Option A: run a pilot study (n ≥ 100–150) for EFA, then collect a main sample (n ≥ 200–300) for CFA. Option B: collect one large dataset and split it randomly 50/50.
Run EFA on Sample 1 using the psych package in R. Report KMO, Bartlett’s test, parallel analysis output, factor loadings, communalities, and percentage variance explained.
Specify the CFA model based on the EFA factor structure. Assign each item to the factor it loaded most strongly on. Drop items with cross-loadings above 0.30 on two or more factors.
Run CFA on Sample 2 using lavaan. Fit the model, evaluate fit indices, and inspect modification indices if fit is inadequate.
Report both analyses in your methodology chapter, clearly stating which sample was used for which analysis and why the sequential approach was chosen.

How to Choose Between EFA and CFA: Decision Rules

The choice between EFA and CFA depends on your research question, the state of the literature, and the purpose of your factor analysis. These rules cover the most common dissertation scenarios:

No prior theory about factor structure → EFA
Established, well-cited factor structure from prior studies → CFA
Adapting a scale to a new language, culture, or population → EFA first, then CFA
Building toward Structural Equation Modelling → CFA mandatory
Developing a new psychometric scale from scratch → EFA first, CFA to validate
Comparing two competing theoretical models → CFA (use model comparison with likelihood ratio test)
Mixed or contradictory literature on factor structure → EFA
Testing construct validity (convergent + discriminant) → CFA

Examples of EFA and CFA in Research

Example 1: EFA of Personality Traits — The Big Five

The most influential application of EFA is the development of the Big Five personality model. Researchers applied EFA to hundreds of personality adjectives across multiple independent samples. Without any prior constraint on which traits should cluster together, EFA consistently revealed five factors: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. This is EFA at its best — no prior theory constrained the analysis, and the five-factor structure replicated across cultures and languages.

To replicate this in R, run EFA with 5 factors and oblimin rotation on the bfi dataset in the psych package (code shown above). Each of the five resulting factors maps cleanly onto one of the Big Five dimensions, with factor loadings above 0.40 for the primary items.

Example 2: CFA of Job Satisfaction — The JSS

Spector’s (1985) Job Satisfaction Survey (JSS) proposes a 9-factor model covering pay, promotion, supervision, fringe benefits, contingent rewards, operating procedures, co-workers, nature of work, and communication. A researcher validating the JSS in a healthcare sample would use CFA: specify all 36 items loading on their designated factors, fit the model with lavaan, and evaluate whether CFI > 0.95, RMSEA < 0.06, and SRMR < 0.08.

If fit is poor (e.g., RMSEA > 0.08), consult modification indices and consider whether any theoretically justifiable correlated residuals between items within the same facet would improve fit. Only modify parameters where there is both statistical and substantive justification. Need help interpreting your CFA output or writing up your p-values and fit indices? Message Dr. Zubair on WhatsApp.

Common EFA and CFA Mistakes to Avoid

Using EFA and CFA on the same sample — the most common dissertation error. Always use independent samples.
Choosing the number of EFA factors by eigenvalue > 1 rule alone — this systematically over-extracts factors. Use parallel analysis (fa.parallel()) instead.
Using orthogonal rotation (varimax) by default — most psychological and social science constructs are correlated. Use oblimin rotation unless theory dictates independence.
Reporting only χ² for CFA — χ² is significant with any n > 200. Always report CFI, TLI, RMSEA, and SRMR alongside it.
Keeping cross-loading items in the CFA model — items that loaded > 0.30 on two or more factors in EFA should be dropped before CFA specification.
Fewer than three items per factor — two-item factors are under-identified in CFA. Each factor needs at least 3 indicators.
Confusing EFA with PCA — Principal Component Analysis (PCA) is a data reduction method, not a factor analysis technique. See our guide on Factor Analysis vs PCA for the full distinction.

EFA and CFA in Thesis and Dissertation: Reporting Requirements

Both analyses appear in the methodology chapter under “Measurement Validation” or “Scale Development.” Here is exactly what your committee and journal reviewers expect:

For EFA: Report KMO value, Bartlett’s test (χ², df, p), number of factors extracted, method for determining factor number (parallel analysis recommended), extraction method (ML recommended), rotation method, eigenvalues for retained factors, percentage variance explained by each factor and total, and a complete factor loading matrix with items bolded if > 0.30.
For CFA: Report the measurement model diagram (path diagram), sample size, estimation method (ML for continuous/normal data; WLSMV for ordinal/Likert), and fit indices: χ²(df), p-value, CFI, TLI, RMSEA [90% CI], SRMR. Report all standardised factor loadings and their significance. State whether modification indices were consulted and, if the model was modified, provide both theoretical and statistical justification.
For both in the same study: Clearly label Sample 1 (EFA) and Sample 2 (CFA) in your methods section. Justify the sequential strategy. Provide descriptive statistics for both samples.

For related analyses your dissertation may also require, see our guides on EFA in R with psych, PCA in R, Factor Analysis vs PCA, and normality testing with Shapiro-Wilk before running your analyses.

Conclusion

The difference between EFA and CFA comes down to one question: do you know the factor structure, or are you trying to find it? EFA discovers structure from data when theory is absent or weak. CFA confirms a structure you have already specified when theory is strong or prior EFA results exist. In R, the psych package handles EFA and the lavaan package handles CFA — both are free, well-documented, and the current standard in academic research.

For PhD and Master’s dissertation researchers: the most defensible methodology for a new measurement instrument is EFA on a pilot sample followed by CFA on an independent main sample. This sequential approach demonstrates both exploratory rigour and confirmatory validity to examiners and reviewers.

If you need expert help with your EFA or CFA analysis — including running the analysis in R or SPSS, interpreting fit indices, writing up results in APA format, or preparing your methodology chapter — contact Dr. Zubair Goraya on WhatsApp or book a session via the link below.

Get Help With Your Factor Analysis → WhatsApp
Book a Consulting Session

Frequently Asked Questions

< details class="ac" > < summary >What is the difference between EFA and CFA?

EFA (Exploratory Factor Analysis) is used when you have no prior theory about the factor structure of your data — it discovers the structure from the data itself. CFA (Confirmatory Factor Analysis) is used when you already have a hypothesised structure and want to test whether your data fit it. EFA is theory-generating; CFA is theory-testing. In R, EFA uses the psych package (fa() function) and CFA uses the lavaan package (cfa() function).

< details class="ac" > < summary >What is EFA in research?

EFA (Exploratory Factor Analysis) is a statistical method used to identify the underlying latent factor structure of a set of observed variables without imposing any prior constraints. EFA determines how many factors exist in the data and which items cluster onto which factors. It is widely used in scale development, psychometrics, and any research context where the structure of a construct has not yet been established. In R, EFA is performed using the fa() function in the psych package.

< details class="ac" > < summary >What is CFA (confirmatory factor analysis)?

Confirmatory Factor Analysis (CFA) is a theory-testing method where the researcher specifies in advance how many factors exist, which items load on which factors, and whether factors are correlated. CFA evaluates how well this pre-specified model fits the observed data using goodness-of-fit indices: CFI (> 0.95), TLI (> 0.95), RMSEA (< 0.06), and SRMR (< 0.08). In R, CFA is performed using the cfa() function in the lavaan package.

< details class="ac" > < summary >When should I use EFA vs CFA in my dissertation?

Use EFA when you are developing a new measurement instrument, adapting an existing scale to a new population, or when the literature provides limited or contradictory evidence about factor structure. Use CFA when you are validating an established scale, testing a theoretically supported structure, or preparing data for Structural Equation Modelling (SEM). Many dissertations use both: EFA on a pilot sample to identify the structure, then CFA on an independent main sample to confirm it.

< details class="ac alt" > < summary >Can I use both EFA and CFA in the same study?

Yes, but you must use independent datasets for each analysis. The standard approach is to collect a pilot sample (n ≥ 100) for EFA and a separate main sample (n ≥ 200) for CFA. Alternatively, randomly split one large dataset 50/50. Running EFA and CFA on the same data is a methodological error — the CFA will always fit well on data it was derived from, which is circularity, not validation.

< details class="ac alt" > < summary >What R packages are used for EFA and CFA?

For EFA in R, the psych package is the standard tool — use the fa() function with nfactors, rotate (“oblimin” or “varimax”), and fm (“ml” for maximum likelihood) arguments. For CFA in R, the lavaan package is the industry standard — define your measurement model using lavaan syntax, fit it with cfa(), and evaluate fit using fitMeasures(). Install both via install.packages(“psych”) and install.packages(“lavaan”) from CRAN.

< details class="ac alt" > < summary >What are acceptable model fit index values for CFA?

Widely accepted CFA fit thresholds are: CFI > 0.95 (acceptable: > 0.90), TLI > 0.95 (acceptable: > 0.90), RMSEA < 0.06 (acceptable: < 0.08), and SRMR < 0.06 (acceptable: < 0.08). Never use chi-square as the sole criterion — it is statistically significant with sample sizes above 200 even when fit is acceptable. Always report at least three fit indices in your dissertation, including RMSEA with its 90% confidence interval.

< !--Related Posts--> < details class="sp arp" open=""> < summary data-hide="Hide all" data-show="Show all">Related Posts

< !--1. Article Schema--> < !--2. FAQPage Schema--> < !--3. HowTo Schema — How to Run EFA and CFA in R--> < !--4. BreadcrumbList Schema--> < !--5. Organization Schema--> < !--6. Person Schema--> < !--7. WebSite Schema-->

To leave a comment for the author, please follow the link and comment on their blog: RStudioDataLab.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.