Handling .Z files

September 13, 2010
By

(This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers)

A while back Steve Mcintyre was looking for a way to handle .Z files in R

Ron Broberg over at the whiteboard had an approach that steve adopted both for untar and for uncompressing .Z files.  While the approach is slick, its somewhat of a hack. Nothing wrong with that, but I wanted something a bit more elegant.

Long ago a reader Nicholas created a package on R called “uncompress” to handle the .Z file issue, but steve was not able to get it to work and neither was I. Luckily Nicholas made his contact info available and I was able to get him a bug report with a file (ghcnv2.Z) and the code I used to download the file and unzip it. The error was relatively minor and related to end of file padding. Nicholas fixed the “bug”  and today I had sucess with downloading and unzipping .Z files. So now in Moshtemp when you download the ghcnv2.Z file I will automagically unzip it for you.

next I decided to look at the untar problem. Steve Mc had “untared” files by copying a version of untar down to his system and then fed that exe a command from inside R. That’s un necessary as R has an “untar” command. So, below, we can see how to download “tar”  files  from NOAA, untar them, and then uncompress them.  Any questions on “uncompress” just write. Its on CRAN

ftp   <- “ftp://ftp.ncdc.noaa.gov/pub/data/icoads”

Imma  <- “IMMA”

start <- 1914

end   <- 1917  # test with small subset

years <-  start:end

Tar_Dir <- “IcoadsTar”

Zfile_Dir <- “IcoadsZ”

Icoads_Dir <- “IcoadsData”

dir.create(Tar_Dir)

dir.create(Zfile_Dir)

dir.create(Icoads_Dir)

fnames <- paste(Imma,”.”,years,”.”,”tar”,sep=”")

# fnames is ALSO fetchable with RCurl.. when I learn it

getIcoadsTar <- function(site=ftp,files=fnames,tDir=Tar_Dir,zDir=Zfile_Dir){

for(i in 1:length(files)){

fullname <- file.path(site,files[i],fsep=.Platform$file.sep)

destinationfile=file.path(tDir,files[i],fsep=.Platform$file.sep)

download.file(fullname,destfile=destinationfile)

untar(destinationfile,exdir=file.path(getwd(),zDir,fsep=.Platform$file.sep))}

}

unZipIcoads <- function(zDir=Zfile_Dir,dataDir=Icoads_Dir){

files <- list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=TRUE,pattern=”(.Z)”)

localnames<-list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=FALSE,pattern=”(.Z)”)

destnames <- gsub(“.Z”,”.dat”,localnames)

for(i in 1:length(files)){

handle <- file(files[i], “rb”)

data <- readBin(handle, “raw”, 99999999)

close(handle)

uncomp_data <- uncompress(data)

handle <- file(file.path(dataDir,destnames[i],fsep=.Platform$file.sep), “wb”)

writeBin(uncomp_data, handle)

close(handle)

}

}

The first function will download and untar the files. When that completes, you unzip them all.

Have a nice weekend

UPDATE:  a cleaner version that cleans up .Z files as you progress:

ftp   <- “ftp://ftp.ncdc.noaa.gov/pub/data/icoads”

Imma  <- “IMMA”

start <- 1914

end   <- 1915  # test with small subset

years <-  start:end

Tar_Dir <- “IcoadsTar”

Zfile_Dir <- “IcoadsZ”

Icoads_Dir <- “IcoadsData”

dir.create(Tar_Dir)

dir.create(Zfile_Dir)

dir.create(Icoads_Dir)

fnames <- paste(Imma,”.”,years,”.”,”tar”,sep=”")

download.unTarUnzipIcoads <- function(site=ftp,tars=fnames,tDir=Tar_Dir,zDir=Zfile_Dir,dDir=Icoads_Dir){

for(i in 1:length(tars)){

fullname <- file.path(site,tars[i],fsep=.Platform$file.sep)

destinationfile=file.path(tDir,tars[i],fsep=.Platform$file.sep)

download.file(fullname,destfile=destinationfile)

untar(destinationfile,exdir=file.path(getwd(),zDir,fsep=.Platform$file.sep))

files <- list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=TRUE,pattern=”(.Z)”)

localnames<-list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=FALSE,pattern=”(.Z)”)

destnames <- gsub(“.Z”,”.dat”,localnames)

for(j in 1:length(destnames)){

handle <- file(files[j], “rb”)

data <- readBin(handle, “raw”, 99999999)

close(handle)

uncomp_data <- uncompress(data)

handle <- file(file.path(dDir,destnames[j],fsep=.Platform$file.sep), “wb”)

writeBin(uncomp_data, handle)

close(handle)

}

unlink(files)

}

}

And if you just want a stand alone version to unzip .Z files

unZipdotZ<-function(Zfile,destfile,remove=TRUE){

# this function is called for the side effect of uncompressing a .Z file

# Zfile is a path to the Zfile

# destfile is the uncompressed file to be written

# no protection against overwriting

# remove the Z file

if(!file.exists(Zfile))stop( cat(Zfile,” does not exist”))

handle <- file(Zfile, “rb”)

data <- readBin(handle, “raw”, 99999999)

close(handle)

uncomp_data <- uncompress(data)

handle <- file(destfile, “wb”)

writeBin(uncomp_data, handle)

close(handle)

if(remove==TRUE)unlink(Zfile)

}


To leave a comment for the author, please follow the link and comment on his blog: Steven Mosher's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags:

Comments are closed.