Difference between unique() and duplicated()

October 20, 2016
By

(This article was first published on R – Francisco Requena, and kindly contributed to R-bloggers)

When we work with data, we usually find with an obstacle: repeated values. This type of values don’t represent a critical problem if we have the ability to identify. Once we have that list of repeated values, it is very easy to discard, eliminate or simply extract.

We are going to see two type of functions in R which allow to identify repeated values: unique() and duplicated() function. Besides, as we will see below, we can use these functions with different types of data, such as vectors, matrix or dataframes.

# Example with vector of numbers

vector_example <- c(1,2,3,4,1)

unique(vector_example)
[1] 1 2 3 4

duplicated(vector_example)
[1] FALSE FALSE FALSE FALSE  TRUE

# Example with vector of strings
vector_example2 <- c("A", "B", "C", "D", "E", "A")

unique(vector_example2)
[1] "A" "B" "C" "D" "E"

duplicated(vector_example2)
[1] FALSE FALSE FALSE FALSE FALSE  TRUE

  • As we can see, unique() function uses numeric indicators to determine unique values.
  • Instead, duplicated() function uses logical values to determine duplicated values.

Besides, we can use these functions in matrix:

set.seed(123)
m <- matrix(sample(1:3, 20, TRUE), ncol = 2, nrow = 10)
m
      [,1] [,2]
[1,]    1    3
[2,]    3    2
[3,]    2    3
[4,]    3    2
[5,]    3    1
[6,]    1    3
[7,]    2    1
[8,]    3    1
[9,]    2    1
[10,]   2    3

duplicated(m)
[1] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE

unique(m)
[,1] [,2]
[1,]    1    3
[2,]    3    2
[3,]    2    3
[4,]    3    1
[5,]    2    1

Now, we will identify unique and duplicated rows, using very common dataframe called iris. Besides, we will also select not repeated rows:

nrow(iris)
[1] 150

nrow(unique(iris)) # The row nº 143 is deleted because is equal to nº 102.
[1] 149 

iris[duplicated(iris),] # We select repeated row nº 143.
[1] 1

iris[!duplicated(iris),] # We select all uniques rows (150 - 1 = 149)
[1] 149

Finally, we can see that we can obtain the same result with iris[unique(iris),] and iris[!duplicated(iris),]

To leave a comment for the author, please follow the link and comment on their blog: R – Francisco Requena.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)