[This article was first published on Data Analysis in R » Quick Guide for Statistics & R » finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Imputing missing values in R appeared first on finnstats.

Are you looking for the latest Data Science Job Vacancies / Internship then click here finnstats.

Imputing missing values in R, When an observation is missing in a column of a data frame or has a character value instead of a numeric value, it is referred to as a missing value in data science.

In order to derive the correct conclusion from the data, missing values must be eliminated or replaced.

We will learn how to deal with missing values using several approaches in this article.

In R, we use several ways to replace the missing value of the column, such as replacing the missing value with zero, average, median, and so on.

How to clean the datasets in R? » janitor Data Cleansing » finnstats

1. In R, replace the column’s missing value with zero.

2. Replace the column’s missing value with the mean.

3. Replace the column’s missing value with the median.

## Imputing missing values in R

Let’s start by making the data frame.

```df<-data.frame(Product = c('A','B', 'C','D','E'),Price=c(612,447,545,374,831))
df
Product Price
1       A   612
2       B   447
3       C   NA
4       D   374
5       E   831```

In the Price column, replace the missing value.

Replace the column’s missing value with zero (0):

In the Price column, replace the missing value with zero.

`df\$Price[is.na(df\$Price)] <- 0`

as a result, the final data frame will be

Power analysis in Statistics with R » finnstats

```df
Product Price
1       A   612
2       B   447
3       C     0
4       D   374
5       E   831```

Replace the column’s missing value with the mean:

Replace the missing value in the Price column with the average.

```df<-data.frame(Product = c('A','B', 'C','D','E'),Price=c(612,447,NA,374,831))
df\$Price[is.na(df\$Price)] <- mean(df\$Price,na.rm = TRUE)
df```

So the output data frame will be

Wilcoxon Signed Rank Test in R » an Overview » finnstats

```    Product Price
1       A   612
2       B   447
3       C   566
4       D   374
5       E   831```

Replace the column’s missing value with the median:

In the Price column, replace the missing number with the median.

```df<-data.frame(Product = c('A','B', 'C','D','E'),Price=c(612,447,NA,374,831))
df\$Price[is.na(df\$Price)]<- median(df\$Price,na.rm = TRUE)
df```

Output data frame will be

```  Product Price
1       A 612.0
2       B 447.0
3       C 529.5
4       D 374.0
5       E 831.0```

To further read visit Handling missing values in R Programming »

To read more visit Imputing missing values in R.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post Imputing missing values in R appeared first on finnstats.