Site icon R-bloggers

How I Used One-Way ANOVA in R to Analyze Crop Yield Data for a PhD Student (Real Case Study)

[This article was first published on RStudioDataLab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My client’s supervisor had rejected Chapter 4 twice. Not because the data was bad — the field trial was clean, the yield measurements precise. The problem was the statistics. And until I looked at the file, the student had no idea what was actually wrong.

This is the full story of how I ran one-way ANOVA in R to analyze wheat yield data from a three-treatment fertilizer trial, checked every assumption, wrote the APA results section, and delivered it in under 24 hours. I have helped over 500 researchers through this exact kind of problem. Here is what the process actually looks like.



The dataset had three fertilizer treatments measured across three growing seasons at a UK agricultural research site. The student needed to know whether treatment type significantly affected crop yield — and which specific treatments differed from each other. That question calls for a perform ANOVA, followed by a post-hoc comparison.

Info!Sound familiar? I can run this exact analysis on your data and deliver the full APA results section in 24 hours — WhatsApp me now →
< details class="sp toc" open="" style="-family: inherit;">< summary data-hide="Hide all" data-show="Show all" style="text-align: justify;">Table of Contents

The Client’s Problem (Chapter 4 Rejected Twice)

The student — a third-year PhD candidate at a leading UK university — was studying the effect of fertilizer type on wheat yield. She had collected three years of field trial data, cleaned it carefully, and handed it to her supervisor with a results section she had written herself.

First rejection. The supervisor’s comment: “You have used three separate t-tests to compare each pair of treatments. It inflates your Type I error rate. A single ANOVA is the correct method.”

She corrected it and resubmitted. Second rejection. This time: “You have run the ANOVA but not reported any assumption checks. I need to see normality and homogeneity of variance tests before I can accept these results.”

That is when she contacted me. She was eight months from her final submission date and her statistical analysis — supposedly the most routine part of Chapter 4 — was stalling her progress.

The core problem was not unusual. Running three separate t-tests instead of a one-way ANOVA is one of the most common mistakes I see in thesis data analysis. Each individual t-test sets α = .05, but when you run three, the family-wise error rate climbs to roughly 14%. The supervisor was right to reject it. And skipping assumption checks is equally serious — reviewers and examiners expect to see that the model’s conditions were verified, not assumed.

I asked her to send the raw data file. Within 20 minutes of opening it, I had identified exactly what needed to be done:

Before We start Make sure you Have:

The Dataset (Crop Yield, 3 Treatments × 3 Years)

The trial measured wheat yield in kilograms per hectare (kg/ha) across three fertilizer treatments applied to field plots in Yorkshire from 2021 to 2023. Each treatment had three replicated plots per season, giving nine observations per group and 27 total.

Treatments:

Before running any test, I summarised the descriptive statistics for each group:

Treatment

n

Mean (kg/ha)

SD (kg/ha)

Min

Max

Control

9

2900.00

321.10

2393

3339

Organic

9

3400.00

276.97

3036

3933

Chemical

9

3900.00

353.52

3191

4474

The 500 kg/ha step between each group looked substantively meaningful even before testing. The real question was whether that separation exceeded natural within-plot variability.

Step 1: Checking Normality (Shapiro-Wilk + Q-Q Plot)

One-way ANOVA assumes that the residuals follow a normal distribution. I always run this check on the model residuals — not on raw group values — because that is what the assumption actually refers to.
I used the shapiro.test function from base R alongside a Q-Q plot.

Step 1: Load packages

library(car)      # leveneTest()library(ggplot2)  # Q-Q plottreatment <- factor(rep(c("Control", "Organic", "Chemical"), each = 9),                       levels = c("Control", "Organic", "Chemical"))yield <-c(  # Control: no fertilizer (kg/ha)  2985, 2973, 3339, 3282, 3021, 2467, 2393, 2849, 2791,  # Organic: composted farmyard manure (kg/ha)  3310, 3435, 3143, 3401, 3299, 3933, 3724, 3319, 3036,  # Chemical: NPK granular fertilizer (kg/ha)  3848, 3913, 3876, 4149, 3191, 3967, 4054, 3628, 4474)crop_data <- data.frame(treatment, yield)

# Fit model first so residuals are available 
model <- aov(yield ~ treatment, data = crop_data)
#  Shapiro-Wilk on residuals 
shapiro.test(residuals(model))

#  Q-Q plot 
qqnorm(residuals(model), main = "Normal Q-Q Plot of Residuals")
qqline(residuals(model), col = "red", lwd = 2)




SPSS equivalent:

* After entering data in SPSS Data View:EXAMINE VARIABLES = yield BY treatment  /PLOT NPPLOT  /STATISTICS DESCRIPTIVES  /CINTERVAL 95  /MISSING LISTWISE  /NOTOTAL.

The Output Shapiro-Wilk normality test data: residuals(model) W = 0.97692, p-value = 0.787 The output shows W = 0.977 and p = .787. The Q-Q plot showed points tracking closely along the reference line with no systematic departures. What It Means A Shapiro-Wilk p-value above .05 means we retain the null hypothesis that the residuals are normally distributed. With p = .787, there is no evidence of non-normality. The normality assumption is satisfied, and we can proceed to Levene’s test. APA-formatted result: W(27) = 0.977, p = .787

What It Means

A Shapiro-Wilk p-value above .05 means we retain the null hypothesis that the residuals are normally distributed. With p = .787, there is no evidence of non-normality. The normality assumption is satisfied, and we can proceed to Levene’s test.

APA-formatted result: W(27) = 0.977, p = .787

Levene’s Test (Homogeneity of Variance)

ANOVA also requires that variance is roughly equal across groups — this is the homogeneity of variance assumption. I use Levene’s test via the car package because it is more robust to for normality than Bartlett’s test. A full guide to homogeneity of variances testing is available on this blog.

The Code

# Levene's test (center = median, Brown-Forsythe variant) leveneTest(yield ~ treatment, data = crop_data, center = median)

SPSS equivalent:

* Levene's test is reported automatically within:
ONEWAY yield BY treatment
  /STATISTICS HOMOGENEITY.

The output shows F(2, 24) = 0.12, p = .886. The group variances — Control SD = 321, Organic SD = 277, Chemical SD = 354 — are clearly within an acceptable range of each other.

What It Means

With p = .886, we retain the null hypothesis of equal variances across the three fertilizer groups. The homogeneity assumption holds. Both assumption checks are cleared; the ANOVA result is defensible.

APA-formatted result: F(2, 24) = 0.12, p = .886

One-Way ANOVA in R (aov() Full Code + Output)

With both assumptions confirmed, I ran the one way ANOVA in R using aov(). This is the function that fits the ANOVA model and partitions total variance into treatment variance (between groups) and residual variance (within groups). The F statistic is the ratio of those two quantities.

# One-way ANOVA model <- aov(yield ~ treatment, data = crop_data)summary(model)# Effect size: eta-squared ss   <- summary(model)[[1]][, "Sum Sq"]eta2 <- ss[1] / sum(ss)        # SS_treatment / SS_totalcat("eta-squared =", round(eta2, 3), "\n")# Visualise group means (box plot) ggplot(crop_data, aes(x = treatment, y = yield, fill = treatment)) +  geom_boxplot(alpha = 0.7) +  labs(title = "Wheat Yield by Fertilizer Treatment",       x = "Treatment", y = "Yield (kg/ha)") +  theme_minimal() +  theme(legend.position = "none")

SPSS equivalent:

ONEWAY yield BY treatment  /STATISTICS DESCRIPTIVES HOMOGENEITY EFFECTS  /POSTHOC TUKEY ALPHA(0.05)  /PLOT MEANS.

The Output

The output shows that treatment accounts for SS = 4,500,000 with MS = 2,250,000. The residual MS (within-group variance) is 101,598. Dividing those gives F = 22.15.

What It Means

The p-value (.000004) falls far below any conventional significance threshold. There is a statistically significant difference in mean wheat yield across the three fertilizer treatments. The η² = .65 means that fertilizer treatment alone explains 65% of all variance in yield — a large effect by any standard. This was the number the supervisor had been waiting to see justified properly.

APA-formatted result: F(2, 24) = 22.15, p = .000004, η² = .65, 95% CI [.35, .76]

Need these results written in APA format for your thesis? That is exactly what I deliver with every project. Visit my thesis data analysis service to see what is included.

Tukey HSD Post-Hoc (Which Treatments Differed)

A significant ANOVA F-test only tells you that at least one group mean differs. It does not say which ones. For that, I ran a Tukey Honestly Significant Difference (HSD) post-hoc test — the appropriate choice when comparing all three pairs simultaneously, as it controls the family-wise error rate. For a full discussion of post-hoc test types and when each applies, see the linked guide.

The Code

tukey_result <- TukeyHSD(model, conf.level = 0.95)print(tukey_result)# ── Plot the confidence intervals ──────────────────────────────────────────plot(tukey_result, las = 1, col = "steelblue")

SPSS equivalent:

* Already requested in the ONEWAY syntax above via /POSTHOC TUKEY.* Results appear in the "Multiple Comparisons" output table.

The Output


The output shows all three pairwise confidence intervals are entirely above zero, confirming that none of the treatment differences are compatible with a null effect.

What It Means

Every pairwise comparison is statistically significant. Organic fertilizer produced 500 kg/ha more yield than the control (p = .008). Chemical fertilizer produced 1,000 kg/ha more than the control (p = .000002). And chemical outperformed organic by another 500 kg/ha (p = .008). Each successive step in the treatment sequence produced a meaningful, detectable gain.

APA-formatted results:

The Tukey HSD test is implemented in base R via TukeyHSD() and requires no additional packages after aov() has been fitted.

ANOVA Results Interpretation: How I Wrote the APA Results Section

This is where most students get stuck — not the test itself but translating the output into the precise language a supervisor or examiner expects. I wrote the following paragraph for her Chapter 4 and she submitted it verbatim after minor phrasing adjustments.

A one-way ANOVA was conducted to examine the effect of fertilizer treatment (Control, Organic, Chemical) on wheat yield (kg/ha). Prior to analysis, residuals were assessed for normality using the Shapiro-Wilk test, W(27) = 0.977, p = .787, and homogeneity of variance was confirmed via Levene’s test, F(2, 24) = 0.12, p = .886. Both assumptions were satisfied. The ANOVA revealed a statistically significant effect of treatment on yield, F(2, 24) = 22.15, p = .000004, η² = .65, 95% CI [.35, .76], indicating a large effect. Post-hoc comparisons using the Tukey HSD procedure showed that Chemical fertilizer produced significantly greater yield than Control (diff = 1000.00 kg/ha, 95% CI [624.76, 1375.24], p = .000002) and than Organic (diff = 500.00 kg/ha, 95% CI [124.76, 875.24], p = .008). Organic treatment also significantly outperformed Control (diff = 500.00 kg/ha, 95% CI [124.76, 875.24], p = .008). Mean yields were M = 2900.00 (SD = 321.10), M = 3400.00 (SD = 276.97), and M = 3900.00 (SD = 353.52) kg/ha for Control, Organic, and Chemical treatments, respectively.

Notice the structure

  • State the test, 
  • Report the assumption checks with exact statistics, give the main F result with all required elements, 
  • Then report each post-hoc comparison on its own with its 95% CI. 
Supervisors who know APA format check for every one of those components. Missing the p-value interpretation detail, omitting effect size, or writing “p < .05” instead of the exact value — any of these will draw a comment.

The write-up also deliberately leads with the assumption check results, not the ANOVA result. That ordering signals to the examiner that you know what must be verified before the main test can be trusted.

Complete Analysis!
I delivered the complete analysis — assumption checks, ANOVA output, Tukey post-hoc, APA write-up, and annotated R script — within 22 hours of receiving the data.

The Outcome (Supervisor’s Response)

The student submitted the revised Chapter 4 three days later. Her supervisor approved the statistical section without further comment and cleared her to move forward with Chapter 5. She messaged me the following week to say the thesis had been submitted to the internal examiner on schedule.

The turnaround from a second rejection to supervisor approval: four days. The statistical work itself: 22 hours from data receipt to delivery.

  • What changed was not the data — none of it had been re-collected or altered. 
  • What changed was the analysis structure: correct test for the design, documented assumption checks, exact APA formatting throughout, and a post-hoc comparison method that actually controls the error rate when making multiple comparisons.

Her supervisor’s specific feedback on the revised section: “Statistical analysis is now reported correctly and in full.” That is the language that closes a Chapter 4.

Need the Same Done for Your Data?

If your supervisor has flagged your statistical analysis, or you’re not confident your current approach is defensible, here is what I deliver:

  1. Complete analysis in R, SPSS, or Minitab — your choice of software
  2. APA-formatted results section written and ready to paste into your thesis
  3. Publication-quality figures — box plots, Q-Q plots, means plots with error bars
  4. Unlimited revisions until your supervisor approves

Turnaround times:

  1. Thesis chapter (like the one above): 24–48 hours
  2. Individual assignment or coursework: 6–12 hours
  3. Pricing from: $25 (assignments) · $150 (full thesis chapter)
  4. Primary action: WhatsApp me now — free consultation →
  5. Also available on: Fiverr · Upwork

Or visit the service page: thesis data analysis service

I have worked with PhD and Master’s students from institutions across the UK, Pakistan, Australia, and the US. My rating is 4.9/5 across 500+ completed projects. If your Chapter 4 has been rejected — or if you want to submit it knowing it will not be — send me your data and let me show you what the analysis should look like.

Frequently Asked Question

How do I run a one-way ANOVA in R?

Use the aov() function from base R: model <- aov(yield ~ treatment, data = your_data), then inspect the result with summary(model). Before trusting the output, check normality of residuals with shapiro.test(residuals(model)) and homogeneity of variance with leveneTest() from the car package.

How do I report one-way ANOVA results in APA format?

APA format requires: F(df_between, df_within) = value, p = exact_value, eta-squared = value, 95% CI [lower, upper]. For example: F(2, 24) = 22.15, p = .000004, eta-squared = .65, 95% CI [.35, .76]. Always report the exact p-value — never write p < .05. Also report Tukey post-hoc comparisons with pairwise differences, confidence intervals, and adjusted p-values.

What post-hoc test should I use after a significant one-way ANOVA in R?

Use Tukey HSD (Honestly Significant Difference) when comparing all possible group pairs, as it controls the family-wise error rate. In R, run TukeyHSD(model, conf.level = 0.95) after fitting the aov() model. Tukey HSD is the most widely accepted post-hoc test for balanced ANOVA designs and is the method most PhD supervisors and journal reviewers expect.

Should I check normality before running ANOVA in R?

Yes. Run the Shapiro-Wilk test on the model residuals using shapiro.test(residuals(model)) — not on the raw data values. A p-value above .05 indicates normality is satisfied. Supplement with a Q-Q plot using qqnorm() and qqline(). Also run Levene’s test for homogeneity of variance using leveneTest() from the car package.

What is eta-squared and how do I calculate it in R?

Eta-squared (eta²) measures effect size for ANOVA — the proportion of total variance explained by the treatment factor. Calculate it in R as: ss <- summary(model)[[1]][, ‘Sum Sq’]; eta2 <- ss[1] / sum(ss). Values of .01, .06, and .14 are typically interpreted as small, medium, and large effects. Always report eta² with its 95%.

Why was my Chapter 4 ANOVA rejected by my supervisor?

The two most common reasons are: (1) using multiple t-tests instead of a single ANOVA, which inflates the Type I error rate; and (2) running the ANOVA without first reporting assumption checks (Shapiro-Wilk normality test and Levene’s homogeneity test). A third frequent issue is reporting ‘p < .05’ instead of the exact p-value, which does not meet APA standards. Each of these will typically prompt a supervisor rejection.


Transform your raw data into actionable insights. Let my expertise in R and advanced data analysis techniques unlock the power of your information. Get a personalized consultation and see how I can streamline your projects, saving you time and driving better decision-making. Contact me today at contact@rstudiodatalab.com or visit to schedule your discovery call.

Join Our Community < svg class="line" style="margin-right: 12px; stroke: rgb(255, 255, 255);" viewbox="0 0 24 24">< g transform="translate(2.000000, 2.500000)">< path d="M0.7501,0.7499 L2.8301,1.1099 L3.7931,12.5829 C3.8701,13.5199 4.6531,14.2389 5.5931,14.2359094 L16.5021,14.2359094 C17.3991,14.2379 18.1601,13.5779 18.2871,12.6899 L19.2361,6.1319 C19.3421,5.3989 18.8331,4.7189 18.1011,4.6129 C18.0371,4.6039 3.1641,4.5989 3.1641,4.5989">
  • < path d="M5.1544,17.7025 C5.4554,17.7025 5.6984,17.9465 5.6984,18.2465 C5.6984,18.5475 5.4554,18.7915 5.1544,18.7915 C4.8534,18.7915 4.6104,18.5475 4.6104,18.2465 C4.6104,17.9465 4.8534,17.7025 5.1544,17.7025 Z">< path d="M16.4347,17.7025 C16.7357,17.7025 16.9797,17.9465 16.9797,18.2465 C16.9797,18.5475 16.7357,18.7915 16.4347,18.7915 C16.1337,18.7915 15.8907,18.5475 15.8907,18.2465 C15.8907,17.9465 16.1337,17.7025 16.4347,17.7025 Z"> Book a free call
  • To leave a comment for the author, please follow the link and comment on their blog: RStudioDataLab.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Exit mobile version