# Handling Missing Values in R using tidyr

**r-bloggers on Programming with R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, We’ll see 3 functions from `tidyr`

that’s useful for handling Missing Values (`NA`

s) in the dataset. Please note: This post isn’t going to be about Missing Value Imputation.

### tidyr

According to the documentation of tidyr,

The goal of tidyr is to help you create tidy data. Tidy data is data where:

+ Every column is variable. + Every row is an observation.. + Every cell is a single value.

Let’s start with loading `tidyr`

library. `tidyr`

is also one of the packages present in `tidyverse`

.

library(tidyr)

### tidyr functions

Following are the 3 tidyr functions that are handy for processing Missing Values

- drop_na()
- fill()
- replace_na()

### Dataset with Missing Value

To get a dataset with missing values, let’s take `mtcars`

and make some missing values in it.

df <- mtcars df$hp[2] <- NA df$cyl[5] <- NA df$gear[5] <- NA df$mpg[10] <- NA # counting number of missing values paste("Number of Missing Values", sum(is.na(df))) ## [1] "Number of Missing Values 4" # dimensions paste("Number of Rows",nrow(df)) ## [1] "Number of Rows 32" paste("Number of Columns",ncol(df)) ## [1] "Number of Columns 11"

Now that we’ve got a dataset with Missing Values (`NA`

s) in it.

### drop_na()

`drop_na()`

drops/removes the rows/entries with Missing Values

library(dplyr) #just in-case if we need to some dplyr verbs ## Warning: package 'dplyr' was built under R version 3.5.2 ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union df_no_na <- drop_na(df) # counting number of missing values paste("Number of Missing Values", sum(is.na(df_no_na))) ## [1] "Number of Missing Values 0" # dimensions paste("Number of Rows",nrow(df_no_na)) ## [1] "Number of Rows 29" paste("Number of Columns",ncol(df_no_na)) ## [1] "Number of Columns 11"

### fill()

`fill()`

fills the `NA`

s (missing values) in selected columns (`dplyr::select()`

options could be used like in the below example with `everything()`

).

It also lets us select the `.direction`

either `down`

(default) or `up`

or `updown`

or `downup`

from where the missing value must be filled.

Quite Naive, but could be handy in a lot of instances like let’s say Time Series data.

df_na_filled <- df %>% fill( dplyr::everything() ) # counting number of missing values paste("Number of Missing Values", sum(is.na(df_na_filled))) ## [1] "Number of Missing Values 0" # dimensions paste("Number of Rows",nrow(df_na_filled)) ## [1] "Number of Rows 32" paste("Number of Columns",ncol(df_na_filled)) ## [1] "Number of Columns 11"

### replace_na()

`replace_na()`

is to be used when you have got the replacement value which the `NA`

s should be filled with.

Below is an example of how we have replaced all `NA`

s with just zero (`0`

)

df_na_replaced <- df %>% mutate_all(replace_na,0) # counting number of missing values paste("Number of Missing Values", sum(is.na(df_na_replaced))) ## [1] "Number of Missing Values 0" # dimensions paste("Number of Rows",nrow(df_na_replaced)) ## [1] "Number of Rows 32" paste("Number of Columns",ncol(df_na_replaced)) ## [1] "Number of Columns 11"

Alternatively, We could’ve simply identified numeric / continous values and replaced their values with `NA`

s like this:

df_na_replaced <- df %>% mutate_if(is.numeric, replace_na,0)

Hopefully, this post would have thrown some light on those three functions of `tidyr`

to handle missing values: `drop_na()`

, `fill()`

, `replace_na()`

.

**If you liked this, Please subscribe to my Language-agnostic Data Science Newsletter and also share it with your friends!**

**leave a comment**for the author, please follow the link and comment on their blog:

**r-bloggers on Programming with R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.