Site icon R-bloggers

How to Remove Duplicates in R with Example

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How to Remove Duplicates in R, when we are dealing with data frames one of the common tasks is the removal of duplicate rows in R.

This can handle while using different functions in R like distinct, unique, duplicated, etc…

This tutorial describes how to remove duplicated rows from a data frame in R while using distinct, duplicated, and unique functions.

Remove Duplicates in R

Let’s load the library and create a data frame

Kurtosis in R

library(dplyr)
data<- data.frame(Column1 = c('P1', 'P1', 'P2', 'P3', 'P1', 'P1', 'P3', 'P4', 'P2', 'P4'), Column2 = c(5, 5, 3, 5, 2, 3, 4, 7, 10, 14))
data
   Column1 Column2
2       P1       5
3       P2       3
4       P3       5
5       P1       2
6       P1       3
7       P3       4
8       P4       7
9       P2      10
10      P4      14

Approach 1: Remove duplicated rows

Let’s make use of a distinct function from dplyr library.

distinct(data)
   Column1 Column2
1      P1       5
2      P2       3
3      P3       5
4      P1       2
5      P1       3
6      P3       4
7      P4       7
8      P2      10
9      P4      14

Approach 2: Remove Duplicates in Column

If we want to delete duplicate rows or values from a certain column, we can use the distinct function.

Let’s remove duplicate rows from Column2.

Quantile-Quantile Plots

distinct(data, Column2)
   Column2
1       5
2       3
3       2
4       4
5       7
6      10
7      14

Suppose you want to remove duplicate values from column2 and want to retain the respective values in Column1,

distinct(data, Column2, .keep_all = TRUE)
   Column1 Column2
1      P1       5
2      P2       3
3      P1       2
4      P3       4
5      P4       7
6      P2      10
7      P4      14

Approach 3: Duplicated function

The duplicated function is also very handy to remove repeated rows from a data frame.

Aggregate Function in R

duplicated(data)
FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Let’s remove the duplicated values.

data[!duplicated(data), ]
    Column1 Column2
1       P1       5
3       P2       3
4       P3       5
5       P1       2
6       P1       3
7       P3       4
8       P4       7
9       P2      10
10      P4      14

Approach 4: Unique Function

unique(data)
    Column1 Column2
1       P1       5
3       P2       3
4       P3       5
5       P1       2
6       P1       3
7       P3       4
8       P4       7
9       P2      10
10      P4      14

FACTS About India » Must Know »

Subscribe to the Newsletter and COMMENT below!

< !-- /wp:paragraph --> < !-- wp:paragraph -->

The post How to Remove Duplicates in R with Example appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Methods – finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.