Reading data into R when dealing with column types and values that need to be considered as NA
Below are code snippets to introduce a few arguments of the read.csv function in R
# Create sample data
strVals <- do.call("c",lapply(1:1000,function(x)paste(sample(letters,sample(5:20,1)),collapse="")))
miscVals <- sample(c("","999","—-","MISS"),100,replace=T)
numVals <- rnorm(1000)# Scenario 1 : Pure numeric and strings
dataTemp<-data.frame(numericVals = numVals, stringVals = strVals)
write.csv(dataTemp,file="inputData.csv",quote=F,row.names=F)
inData <- read.csv("inputData.csv",header=T)
sapply(inData,class)
# Col: stringVals is type factor# Using the function argument stringsAsFactors = FALSE mitigates character columns
# being turned into factor type
inData <- read.csv("inputData.csv",header=T,stringsAsFactors=FALSE)
sapply(inData,class)# Using function argument colClasses
# predefine the column types in the input file
inData <- read.csv("inputData.csv",header=T,colClasses = c("numeric","character"))
sapply(inData,class)# If you have data values that need to be considered as NA
# Add values from miscVals ( "","999","—-","MISS" ) to numVals and strVals
numMiscVals <- sample(c(numVals,miscVals),1000)
strMiscVals <- sample(c(strVals,miscVals),1000)dataTemp<-data.frame(numericVals = numMiscVals, stringVals = strMiscVals)
write.csv(dataTemp,file="inputData.csv",quote=F,row.names=F)
inData 0# Use na.strings argument
inData <- read.csv("inputData.csv",header=T,stringsAsFactors=FALSE,na.strings = c("","999","—-","MISS"))
sapply(inData,class)
# The columns have the right type numericVals is numeric and stringVals is character
sum(c("","999","—-","MISS") %in% inData$numericVals)
# should return 0
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).