Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Label Outliers in Boxplots in ggplot2? appeared first on Data Science Tutorials

How to Label Outliers in Boxplots in ggplot2, This article offers a detailed illustration of how to name outliers in ggplot2 boxplots.

## Step 1: Construct the data frame.

Create the following data frame first, which will include details on the 60 distinct basketball players who played for three separate teams and scored points.

How to add labels at the end of each line in ggplot2? (datasciencetut.com)

Make this illustration repeatable.

`set.seed(123)`

Now we can create a data frame

```df <- data.frame(team=rep(c('A', 'B', 'C'), each=20),
player=rep(LETTERS[1:20], times=3),
points=round(rnorm(n=60, mean=30, sd=10), 2))```

Let’s view the head of the data frame

```head(df)
team player points
1    A      A  37.84
2    A      B  42.60
3    A      C  40.96
4    A      D   5.78
5    A      E  37.65
6    A      F  24.98```

## Step 2: Define a Function to Identify Outliers

An observation is considered an outlier in ggplot2 if it satisfies any of the following two criteria:

The observation falls within the first quartile by 1.5 times the interquartile range (Q1)

The observation exceeds the third quartile by 1.5 times the interquartile range (Q3).

If an observation satisfies any of these two criteria, we can build the following function in the R language to classify it as an outlier.

Change ggplot2 Theme Color in R- Data Science Tutorials

```findoutlier <- function(x) {
return(x < quantile(x, .25) - 1.5*IQR(x) | x > quantile(x, .75) + 1.5*IQR(x))
}```

## Step 3: In ggplot2, label outliers in boxplots

The next step is to use the code below to label outliers in ggplot2 boxplots:

```library(ggplot2)
library(dplyr)```

to the data frame, including a new column that shows if each observation is an outlier.

```df <- df %>%
group_by(team) %>%
mutate(outlier = ifelse(findoutlier(points), points, NA))```

Now we can create a box plot of points by team and label outliers

```ggplot(df, aes(x=team, y=points)) +
geom_boxplot() +
geom_text(aes(label=outlier), na.rm=TRUE, hjust=-.5)``` Please take note that we may alternatively classify these outliers using a different variable.

To label the outliers based on the player name instead, we could, for instance, switch out points for players in the mutate() function.

```library(ggplot2)
library(dplyr)
df <- df %>%
group_by(team) %>%
mutate(outlier = ifelse(findoutlier(points), player, NA))```

build a box plot of the team’s points and identify outliers.

Best GGPlot Themes You Should Know – Data Science Tutorials

```ggplot(df, aes(x=team, y=points)) +
geom_boxplot() +
geom_text(aes(label=outlier), na.rm=TRUE, hjust=-.5)``` The outlier on team A now has a label of N and the outlier on team B now has a label of D, since these represent the player names who have outlier values for points.

The post How to Label Outliers in Boxplots in ggplot2? appeared first on Data Science Tutorials