get UCSC images for a list of regions in batch

July 24, 2012
By

(This article was first published on One Tip Per Day, and kindly contributed to R-bloggers)

Here is my working R code for the task. It can be simplified as 3 lines.
# example of controling individual track
#theURL=”http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm9&wgRna=hide&cpgIslandExt=pack&ensGene=hide&mrna=hide&intronEst=hide&mgcGenes=hide&hgt.psOutput=on&cons44way=hide&snp130=hide&snpArray=hide&wgEncodeReg=hide&pix=1000&refGene=pack&knownGene=hide&rmsk=hide
# example of controling via session
theURL=”http://genome-preview.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=sterding&hgS_otherUserSessionName=mm9_piRNA&hgt.psOutput=on&pix=1000
# read regions
toPlot=read.table(“../data/piRNA.clusters.coordinates.cpg.bed”, header=T)
## paralle version
library(multicore)
# mclapply(1:nrow(toPlot), function(i) screenshotUCSC(theURL, “”, as.character(toPlot$chr[i]), toPlot$start[i]-2000, toPlot$end[i]+1999, paste(“region_”, i, “_”, toPlot$name[i],”.pdf”, sep=””)), mc.cores=10)
# anti-robot version 
# UCSC Policy: Program-driven use of this software is limited to a maximum of one hit every 15 seconds and no more than 5,000 hits per day.
for(i in 1:nrow(toPlot)){
    screenshotUCSC(theURL, “”, as.character(toPlot$chr[i]), toPlot$start[i]-2000, toPlot$end[i]+1999, paste(“region_”, i, “_”, toPlot$name[i],”.pdf”, sep=””))
    Sys.sleep(5) 
}
# merge script
mergePDF(“piRNAs_ucsc_screeshot.pdf”, list.files(pattern=”region_.*.pdf”))
try(system(“rm region_.*.pdf”))
####### lib ############
# Here is an R script wrote by Aaron Statham which saves UCSC to pdfs –
# you can choose which genome and tracks to display by altering the ‘url’ parameter. ‘trackfile’ is the url of a file describing the custom tracks (beds/bigwigs) to display
mergePDF <- function(output=”merged.pdf”, sourcefiles=c(“source1.pdf”,”source2.pdf”,”source3.pdf”))
{
    # create the command string and call the command using system()
    # merging command from http://hints.macworld.com/article.php?story=2003083122212228
    command=paste(“gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite”,paste(“-sOutputFile”,output, sep=”=”), paste(sourcefiles, collapse=” “),sep=” “)
    try(system(command))
}
#  Reference: http://www.biostars.org/post/show/6132/batch-viewing-of-ucsc-browser-graphic/
screenshotUCSC <- function(url, trackfile, chr, start, end, filename) {
        oldpen <- options(“scipen”)
        options(scipen=100)
        temp <- readLines(paste(url, “&hgt.customText=”, trackfile, “&position=”,chr,”:”,start,”-“,end, sep=””))
        #cat(temp,”\n”)
        pdfurl <- paste(“http://genome-preview.ucsc.edu/trash“,gsub(“.*trash”,””,gsub(“.pdf.*”,””,temp[grep(“.pdf”, temp, fixed=TRUE)][1])), “.pdf”, sep=””)
        cat(pdfurl,”\n”);
        options(scipen=oldpen)
        download.file(pdfurl, filename, mode=”wb”, quiet=TRUE)
}

To leave a comment for the author, please follow the link and comment on their blog: One Tip Per Day.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)