Handling .Z files

[This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A while back Steve Mcintyre was looking for a way to handle .Z files in R

Ron Broberg over at the whiteboard had an approach that steve adopted both for untar and for uncompressing .Z files.  While the approach is slick, its somewhat of a hack. Nothing wrong with that, but I wanted something a bit more elegant.

Long ago a reader Nicholas created a package on R called “uncompress” to handle the .Z file issue, but steve was not able to get it to work and neither was I. Luckily Nicholas made his contact info available and I was able to get him a bug report with a file (ghcnv2.Z) and the code I used to download the file and unzip it. The error was relatively minor and related to end of file padding. Nicholas fixed the “bug”  and today I had sucess with downloading and unzipping .Z files. So now in Moshtemp when you download the ghcnv2.Z file I will automagically unzip it for you.

next I decided to look at the untar problem. Steve Mc had “untared” files by copying a version of untar down to his system and then fed that exe a command from inside R. That’s un necessary as R has an “untar” command. So, below, we can see how to download “tar”  files  from NOAA, untar them, and then uncompress them.  Any questions on “uncompress” just write. Its on CRAN

ftp   <- “ftp://ftp.ncdc.noaa.gov/pub/data/icoads”

Imma  <- “IMMA”

start <- 1914

end   <- 1917  # test with small subset

years <-  start:end

Tar_Dir <- “IcoadsTar”

Zfile_Dir <- “IcoadsZ”

Icoads_Dir <- “IcoadsData”

dir.create(Tar_Dir)

dir.create(Zfile_Dir)

dir.create(Icoads_Dir)

fnames <- paste(Imma,”.”,years,”.”,”tar”,sep=”")

# fnames is ALSO fetchable with RCurl.. when I learn it

getIcoadsTar <- function(site=ftp,files=fnames,tDir=Tar_Dir,zDir=Zfile_Dir){

for(i in 1:length(files)){

fullname <- file.path(site,files[i],fsep=.Platform$file.sep)

destinationfile=file.path(tDir,files[i],fsep=.Platform$file.sep)

download.file(fullname,destfile=destinationfile)

untar(destinationfile,exdir=file.path(getwd(),zDir,fsep=.Platform$file.sep))}

}

unZipIcoads <- function(zDir=Zfile_Dir,dataDir=Icoads_Dir){

files <- list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=TRUE,pattern=”(.Z)”)

localnames<-list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=FALSE,pattern=”(.Z)”)

destnames <- gsub(“.Z”,”.dat”,localnames)

for(i in 1:length(files)){

handle <- file(files[i], “rb”)

data <- readBin(handle, “raw”, 99999999)

close(handle)

uncomp_data <- uncompress(data)

handle <- file(file.path(dataDir,destnames[i],fsep=.Platform$file.sep), “wb”)

writeBin(uncomp_data, handle)

close(handle)

}

}

The first function will download and untar the files. When that completes, you unzip them all.

Have a nice weekend

UPDATE:  a cleaner version that cleans up .Z files as you progress:

ftp   <- “ftp://ftp.ncdc.noaa.gov/pub/data/icoads”

Imma  <- “IMMA”

start <- 1914

end   <- 1915  # test with small subset

years <-  start:end

Tar_Dir <- “IcoadsTar”

Zfile_Dir <- “IcoadsZ”

Icoads_Dir <- “IcoadsData”

dir.create(Tar_Dir)

dir.create(Zfile_Dir)

dir.create(Icoads_Dir)

fnames <- paste(Imma,”.”,years,”.”,”tar”,sep=”")

download.unTarUnzipIcoads <- function(site=ftp,tars=fnames,tDir=Tar_Dir,zDir=Zfile_Dir,dDir=Icoads_Dir){

for(i in 1:length(tars)){

fullname <- file.path(site,tars[i],fsep=.Platform$file.sep)

destinationfile=file.path(tDir,tars[i],fsep=.Platform$file.sep)

download.file(fullname,destfile=destinationfile)

untar(destinationfile,exdir=file.path(getwd(),zDir,fsep=.Platform$file.sep))

files <- list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=TRUE,pattern=”(.Z)”)

localnames<-list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=FALSE,pattern=”(.Z)”)

destnames <- gsub(“.Z”,”.dat”,localnames)

for(j in 1:length(destnames)){

handle <- file(files[j], “rb”)

data <- readBin(handle, “raw”, 99999999)

close(handle)

uncomp_data <- uncompress(data)

handle <- file(file.path(dDir,destnames[j],fsep=.Platform$file.sep), “wb”)

writeBin(uncomp_data, handle)

close(handle)

}

unlink(files)

}

}

And if you just want a stand alone version to unzip .Z files

unZipdotZ<-function(Zfile,destfile,remove=TRUE){

# this function is called for the side effect of uncompressing a .Z file

# Zfile is a path to the Zfile

# destfile is the uncompressed file to be written

# no protection against overwriting

# remove the Z file

if(!file.exists(Zfile))stop( cat(Zfile,” does not exist”))

handle <- file(Zfile, “rb”)

data <- readBin(handle, “raw”, 99999999)

close(handle)

uncomp_data <- uncompress(data)

handle <- file(destfile, “wb”)

writeBin(uncomp_data, handle)

close(handle)

if(remove==TRUE)unlink(Zfile)

}


To leave a comment for the author, please follow the link and comment on their blog: Steven Mosher's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)