Imputing missing values in R

[This article was first published on Data Analysis in R » Quick Guide for Statistics & R » finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Imputing missing values in R appeared first on finnstats.

If you want to read the original article, click here Imputing missing values in R.

Are you looking for the latest Data Science Job Vacancies / Internship then click here finnstats.

.

Imputing missing values in R, When an observation is missing in a column of a data frame or has a character value instead of a numeric value, it is referred to as a missing value in data science.

Subscribe to our newsletter!

In order to derive the correct conclusion from the data, missing values must be eliminated or replaced.

We will learn how to deal with missing values using several approaches in this article.

In R, we use several ways to replace the missing value of the column, such as replacing the missing value with zero, average, median, and so on.

How to clean the datasets in R? » janitor Data Cleansing » finnstats

We’ll look at how to do it in this article.

1. In R, replace the column’s missing value with zero.

2. Replace the column’s missing value with the mean.

3. Replace the column’s missing value with the median.

Imputing missing values in R

Let’s start by making the data frame.

df<-data.frame(Product = c('A','B', 'C','D','E'),Price=c(612,447,545,374,831))
df
   Product Price
1       A   612
2       B   447
3       C   NA
4       D   374
5       E   831

In the Price column, replace the missing value.

Replace the column’s missing value with zero (0):

In the Price column, replace the missing value with zero.

df$Price[is.na(df$Price)] <- 0

as a result, the final data frame will be

Power analysis in Statistics with R » finnstats

df
  Product Price
1       A   612
2       B   447
3       C     0
4       D   374
5       E   831

Replace the column’s missing value with the mean:

Replace the missing value in the Price column with the average.

df<-data.frame(Product = c('A','B', 'C','D','E'),Price=c(612,447,NA,374,831))
df$Price[is.na(df$Price)] <- mean(df$Price,na.rm = TRUE)
df

So the output data frame will be

Wilcoxon Signed Rank Test in R » an Overview » finnstats

    Product Price
1       A   612
2       B   447
3       C   566
4       D   374
5       E   831

Replace the column’s missing value with the median:

In the Price column, replace the missing number with the median.

df<-data.frame(Product = c('A','B', 'C','D','E'),Price=c(612,447,NA,374,831))
df$Price[is.na(df$Price)]<- median(df$Price,na.rm = TRUE)
df

Output data frame will be

  Product Price
1       A 612.0
2       B 447.0
3       C 529.5
4       D 374.0
5       E 831.0

To further read visit Handling missing values in R Programming »

To read more visit Imputing missing values in R.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post Imputing missing values in R appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R » Quick Guide for Statistics & R » finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)