Dealing with missing values

March 8, 2009

(This article was first published on One R Tip A Day, and kindly contributed to R-bloggers)

Two new quick tips from ‘almost regular’ contributor Jason:

Handling missing values in R can be tricky. Let’s say you have a table
with missing values you’d like to read from disk. Reading in the table

read.table( fileName )

might fail. If your table is properly formatted, then R can determine
what’s a missing value by using the “sep” option in read.table:

read.table( fileName, sep=”\t” )

This tells R that all my columns will be separated by TABS regardless of
whether there’s data there or not. So, make sure that your file on disk
really is fully TAB separated: if there is a missing data point you must
have a TAB to tell R that this datum is missing and to move to the next
field for processing.

Lastly, don’t forget the “header=T” option if you have a header line in
your file.

Here’s the 2nd tip:

Some algorithms in R don’t support missing (NA) values. If you have a
data.frame with missing values and quickly want the ROWS with any
missing data to be removed then try:

myData[rowSums(, ]

To find NA values in your data you have to use the “” function.

To leave a comment for the author, please follow the link and comment on their blog: One R Tip A Day. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)