How to perform the Kruskal-Wallis test in R?

[This article was first published on datasciencetut.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to perform the Kruskal-Wallis test in R? appeared first on .

How to perform the Kruskal-Wallis test in R, when there are more than two groups, the Kruskal-Wallis test by rank is a non-parametric alternative to the one-way ANOVA test.

It extends the two-samples Wilcoxon test. When the assumptions of the one-way ANOVA test are not met, this method is advised.

This article will show you how to use R to compute the Kruskal-Wallis test.

How to perform the Kruskal-Wallis test in R

We’ll use the PlantGrowth data set that comes with R. It provides the weight of plants produced under two distinct treatment conditions and a control condition.

data <- PlantGrowth

Let’s print the head of the file

head(data)
  weight group
1   4.17  ctrl
2   5.58  ctrl
3   5.18  ctrl
4   6.11  ctrl
5   4.50  ctrl
6   4.61  ctrl

The column “group” is known as a factor in R, while the different categories (“ctr”, “trt1”, “trt2”) are known as factor levels. The levels are listed in alphabetical order.

Display group levels

levels(data$group)
[1] "ctrl" "trt1" "trt2"

If the levels are not in the correct order automatically, reorder them as follows:

data$group <- ordered(data$group,
                         levels = c("ctrl", "trt1", "trt2"))

Summary statistics can be calculated by groupings. You can use the dplyr package.

Type this to install the dplyr package:

install.packages("dplyr")

Compute summary statistics by groups:

library(dplyr)
group_by(data, group) %>%
  summarise(
    count = n(),
    mean = mean(weight, na.rm = TRUE),
    sd = sd(weight, na.rm = TRUE),
    median = median(weight, na.rm = TRUE),
    IQR = IQR(weight, na.rm = TRUE)
  )

Source: local data frame [3 x 6]

   group count  mean        sd median    IQR
  (fctr) (int) (dbl)     (dbl)  (dbl)  (dbl)
1   ctrl    10 5.032 0.5830914  5.155 0.7425
2   trt1    10 4.661 0.7936757  4.550 0.6625
3   trt2    10 5.526 0.4425733  5.435 0.4675

Use box plots to visualize the data.

Read R base graphs to learn how to utilize them. For easy ggplot2-based data visualization, we’ll use the ggpubr R tool.

Download and install the most recent version of ggpubr.

install.packages("ggpubr")

Let’s plot weight by group and color by group

library("ggpubr")
ggboxplot(my_data, x = "group", y = "weight",
          color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
          order = c("ctrl", "trt1", "trt2"),
          ylab = "Weight", xlab = "Treatment")

Add error bars: mean_se

library("ggpubr")
ggline(data, x = "group", y = "weight",
       add = c("mean_se", "jitter"),
       order = c("ctrl", "trt1", "trt2"),
       ylab = "Weight", xlab = "Treatment")

Compute Kruskal-Wallis test

We want to see if the average weights of the plants in the three experimental circumstances vary significantly.

The test can be run using the kruskal.test() function as follows.

kruskal.test(weight ~ group, data = data)

    Kruskal-Wallis rank-sum test

data:  weight by group
Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842

Inference

We can conclude that there are significant differences between the treatment groups because the p-value is less than the significance criterion of 0.05.

Multiple pairwise comparisons between groups were conducted.

We know there is a substantial difference between groups based on the Kruskal-Wallis test’s results, but we don’t know which pairings of groups are different.

The function pairwise.wilcox.test() can be used to calculate pairwise comparisons between group levels with different testing corrections.

pairwise.wilcox.test(PlantGrowth$weight, PlantGrowth$group,
                 p.adjust.method = "BH")

    Pairwise comparisons using the Wilcoxon rank-sum test

How to perform a one-sample t-test in R?

data:  PlantGrowth$weight and PlantGrowth$group
     ctrl  trt1
trt1 0.199 -   
trt2 0.095 0.027

p-value adjustment method: BH

Conclusion

Only trt1 and trt2 are statistically different (p 0.05) in the pairwise comparison.

The post How to perform the Kruskal-Wallis test in R? appeared first on .

To leave a comment for the author, please follow the link and comment on their blog: datasciencetut.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)