One way ANOVA Example in R-Quick Guide

DataScienceTutor

4 days ago

[This article was first published on datasciencetut.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post One way ANOVA Example in R-Quick Guide appeared first on

–

One way ANOVA Example in R, the one-way analysis of variance (ANOVA), also known as one-factor ANOVA, is an extension of the independent two-sample t-test for comparing means when more than two groups are present.

The data is divided into numerous groups using one single grouping variable in one-way ANOVA (also called factor variable).

The basic premise of the one-way ANOVA test is described in this lesson, which also includes practical ANOVA test examples in R software.

Hypotheses for ANOVA tests:

The null hypothesis is that the means of the various groups are identical.

Alternative hypothesis: At least one sample mean differs from the rest.

You can use the t-test if you just have two groups. The F-test and the t-test are equivalent in this scenario.

The ANOVA test is described in this section.

Only when the observations are gathered separately and randomly from the population described by the factor levels can the ANOVA test be used.

Each factor level’s data is normally distributed.

The variance in these typical populations is similar. (This can be verified using Levene’s test.)

What is the one-way ANOVA test?

Assume we have three groups to compare (A, B, and C):

Calculate the common variance, often known as residual variance or variance within samples (S2within).

Calculate the difference in sample means as follows:

Calculate the average of each group.

Calculate the difference in sample means (S2between)

As the ratio of S2between/S2within, calculate the F-statistic.

Note that a lower ratio (ratio 1) suggests that the means of the samples being compared are not significantly different.

A greater ratio, on the other hand, indicates that the differences in group means are significant.

In R, visualize your data and do one-way ANOVA.

We’ll use the PlantGrowth data set that comes with R. It provides the weight of plants produced under two distinct treatment conditions and a control condition.

data <- PlantGrowth

We utilize the function sample n() [in the dplyr package] to get a sense of how the data looks.

The sample n() function prints out a random selection of observations from the data frame:

To display a sample at random

set.seed(123)
dplyr::sample_n(data, 10)
   weight group
1    5.87  trt1
2    4.32  trt1
3    3.59  trt1
4    5.18  ctrl
5    5.14  ctrl
6    4.89  trt1
7    5.12  trt2
8    4.81  trt1
9    4.50  ctrl
10   4.69  trt1

The column “group” is known as a factor in R, while the different categories (“ctr”, “trt1”, “trt2”) are known as factor levels. The levels are listed in alphabetical order.

levels(data$group)
[1] "ctrl" "trt1" "trt2"

If the levels are not in the correct order automatically, reorder them as follows:

data$group <- ordered(data$group,
                         levels = c("ctrl", "trt1", "trt2"))

The dplyr package can be used to compute summary statistics (mean and sd) by groups.

Calculate summary statistics by groups – count, mean, and standard deviation:

library(dplyr)
group_by(data, group) %>%
  summarise(
    count = n(),
    mean = mean(weight, na.rm = TRUE),
    sd = sd(weight, na.rm = TRUE)
  )
group count  mean    sd
  <ord> <int> <dbl> <dbl>
1 ctrl     10  5.03 0.583
2 trt1     10  4.66 0.794
3 trt2     10  5.53 0.443

Visualize your information

Read R base graphs to learn how to utilize them. For an easy ggplot2-based data visualization, we’ll use the ggpubr R tool.

Best online course for R programming

install.packages("ggpubr")

Visualize your data with ggpubr:

library("ggpubr")
ggboxplot(data, x = "group", y = "weight",
          color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
          order = c("ctrl", "trt1", "trt2"),
          ylab = "Weight", xlab = "Treatment")

One-way ANOVA Test in R

Add error bars: mean_se

library("ggpubr")
ggline(data, x = "group", y = "weight",
       add = c("mean_se", "jitter"),
       order = c("ctrl", "trt1", "trt2"),
       ylab = "Weight", xlab = "Treatment")

R’s one-way ANOVA test

Type the following scripts if you still want to utilise R basic graphs:

boxplot(weight ~ group, data = data,
        xlab = "Treatment", ylab = "Weight",
        frame = FALSE, col = c("#00AFBB", "#E7B800", "#FC4E07"))

For plot means

library("gplots")
plotmeans(weight ~ group, data = data, frame = FALSE,
          xlab = "Treatment", ylab = "Weight",
          main="Mean Plot with 95% CI")

Calculate the one-way ANOVA test.

We want to see if the average weights of the plants in the three experimental circumstances vary significantly.

This question can be answered using the R function aov(). Summary of the function The analysis of the variance model is summarised using aov().

Perform an analysis of variance.

res.aov <- aov(weight ~ group, data = data)

Summary of the analysis

summary(res.aov)
            Df Sum Sq Mean Sq F value Pr(>F)
group        2  3.766  1.8832   4.846 0.0159 *
Residuals   27 10.492  0.3886                
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The columns F value and Pr(>F) in the output corresponding to the p-value of the test.

The results of one-way ANOVA testing should be interpreted.

We can conclude that there are significant differences between the groups highlighted with “*” in the model summary because the p-value is less than the significance level of 0.05.

Multiple pairwise comparisons between group means

A significant p-value implies that some of the group means are different in a one-way ANOVA test, but we don’t know which pairs of groups are different.

Multiple pairwise comparisons can be performed to see if the mean differences between certain pairs of groups are statistically significant.

Multiple Tukey pairwise comparisons

We can compute Tukey HSD (Tukey Honest Significant Differences, R function: TukeyHSD()) for doing numerous pairwise comparisons between the means of groups because the ANOVA test is significant.

The fitted ANOVA is passed to TukeyHD() as an input.

TukeyHSD(res.aov)

Tukey multiple comparisons of means

    95% family-wise confidence level
Fit: aov(formula = weight ~ group, data = data)
$group
            diff        lwr       upr     p adj
trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
trt2-ctrl  0.494 -0.1972161 1.1852161 0.1979960
trt2-trt1  0.865  0.1737839 1.5562161 0.0120064

lwr, upr: the lower and upper-end points of the confidence interval at 95 percent (default) p adj: p-value after multiple comparisons adjustment.

Only the difference between trt2 and trt1 is significant, as shown by the output, with an adjusted p-value of 0.012.

Using the multcomp package, perform several comparisons.

The function glht() [in the multcomp package] can be used to do multiple comparison processes for an ANOVA. General linear hypothesis tests are abbreviated as glht. The following is a simplified format:

library(multcomp)

glht(model, lincft)

model: a model that has been fitted, such as an object returned by aov ().

lincft() specifies the linear hypotheses that will be tested. Objects provided from the function mcp are used to specify multiple comparisons in ANOVA models ().

For a one-way ANOVA, use glht() to make numerous pairwise comparisons:

library(multcomp)
summary(glht(res.aov, linfct = mcp(group = "Tukey")))
     Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: aov(formula = weight ~ group, data = my_data)
Linear Hypotheses:
                 Estimate Std. Error t value Pr(>|t|) 
trt1 - ctrl == 0  -0.3710     0.2788  -1.331    0.391 
trt2 - ctrl == 0   0.4940     0.2788   1.772    0.198 
trt2 - trt1 == 0   0.8650     0.2788   3.103    0.012 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

T-test with pairs

Pairwise comparisons across group levels with various testing corrections can also be calculated using the pairewise.t.test() function.

pairwise.t.test(data$weight, data$group
                 p.adjust.method = "BH")

Pairwise comparisons using t tests with pooled SD

data:  data$weight and data$group
     ctrl  trt1
trt1 0.194 -   
trt2 0.132 0.013
P value adjustment method: BH

The output is a table of pairwise comparison p-values. The Benjamini-Hochberg method was used to alter the p-values in this case.

The validity of ANOVA assumptions should be checked.

The ANOVA test assumes that the data are normally distributed and that group variance is uniform. With certain diagnostic plots, we can verify this.

Examine the assumption of homogeneity of variance.

The residuals versus fits plot can be used to assess variance homogeneity.

There are no clear correlations between residuals and fitted values (the mean of each group) in the plot below, which is good. As a result, we can assume that the variances are homogeneous.

1: Variance homogeneity

plot(res.aov, 1)

ANOVA (one-way) Outliers are discovered in R Points 17, 15, 4, which can have a significant impact on normality and homogeneity of variance.

To meet the test assumptions, it can be beneficial to remove outliers.

The homogeneity of variances can also be checked using Bartlett’s or Levene’s tests.

Levene’s test is recommended since it is less sensitive to deviations from normal distribution. The leveneTest() method [from the car package] will be used:

library(car)
leveneTest(weight ~ group, data = data)

Levene’s Test for Homogeneity of Variance (center = median)

      Df F value Pr(>F)
group  2  1.1192 0.3412
      27

The p-value is not less than the significance level of 0.05, as seen in the output above. This indicates that there is no indication that the variance across groups is statistically significant.

As a result, we can infer that the variations in the different treatment groups are homogeneous.

Relaxing the premise of homogeneity of variance

The traditional one-way ANOVA test assumes that all groups have similar variances. The homogeneity of variance assumption was fine in our case: the Levene test was not significant.

In a circumstance where the homogeneity of variance assumption is violated, how do we save our ANOVA test?

An alternate approach (e.g., the Welch one-way test) that does not necessitate the use of assumptions in the one-way function. test().

ANOVA test with no equal variance assumption

oneway.test(weight ~ group, data = data)

one-way analysis of means (not assuming equal variances)

data:  weight and group
F = 5.181, num df = 2.000, denom df = 17.128, p-value = 0.01739

Pairwise t-tests with no assumption of equal variances

pairwise.t.test(data$weight, data$group,
                 p.adjust.method = "BH", pool.sd = FALSE)

Pairwise comparisons using t-tests with non-pooled SD

data:  data$weight and data$group
     ctrl  trt1
trt1 0.250 -   
trt2 0.072 0.028
P value adjustment method: BH

Examine the presumption of normality.

Residuals normality plot The residuals quantiles are displayed against the normal distribution quantiles in the graph below. Also plotted is a 45-degree reference line.

The residuals’ normal probability plot is used to verify that the residuals are normally distributed. It should be about in a straight line.

2: Normality

plot(res.aov, 2)

ANOVA (one-way) R-based test

We can infer normality because all of the points lie roughly along this reference line.

Extract the residuals

aov_residuals <- residuals(object = res.aov )

Run Shapiro-Wilk test

Shapiro-Wilk normality test

shapiro.test(x = aov_residuals )
data:  aov_residuals
W = 0.96607, p-value = 0.4379

The Shapiro-Wilk test on the ANOVA residuals (W = 0.96, p = 0.43), which finds no evidence of normality violation, supports the previous conclusion.

ANOVA test with a non-parametric alternative

The Kruskal-Wallis rank-sum test is a non-parametric alternative to one-way ANOVA that can be employed when the ANOVA assumptions are not met.

kruskal.test(weight ~ group, data = data)

Kruskal-Wallis rank-sum test

data:  weight by group
Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842

The post One way ANOVA Example in R-Quick Guide appeared first on

To leave a comment for the author, please follow the link and comment on their blog: datasciencetut.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.