# Set Operations in R and Python. Useful!

September 4, 2015
By

Set operations are super useful when data cleaning or testing scripts. They are a must have in any analyst’s (data scientist’s/statistician’s/data wizard’s) toolbox. Here is a quick rundown in both R and python.

Say we have two vectors x and y…

```# vector x
x = c(1,2,3,4,5,6)

# vector y
y = c(4,5,6,7,8,9)
```

What if we ‘combined’ x and y ignoring any duplicate elements? ( $x \cup y$)

```# x UNION y
union(x, y)

 1 2 3 4 5 6 7 8 9
```

What are the common elements in x and y? ( $x \cap y$)

```# x INTERSECTION y
intersect(x, y)

 4 5 6
```

What elements feature in x but not in y?

```# x members not in y
setdiff(x,y)

 1 2 3
```

What elements feature in y but not in x?

```# y members not in x
setdiff(y,x)

 7 8 9
```

How might we visualise all this?

What about python? In standard python there exists a module called ‘sets’ that allows for the creation of a ‘Set’ object from a python list. The Set object has methods that provide the same functionality as the R functions above.  