Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

rtracklayer package allows you to access most of the UCSC wiggle tracks from R. However, there is another way which might more practical in situations where you need to summarize the wig track scores over a given set of genomic coordinates. Although you can get a similar information from rtracklayer, you will need to do the summary statistics for each genomic coordinate and you will need to loop over a RangedList object, which may not be practical if you have a large set of genomic coordinates.

The other way to do this is to use hgWiggle program from UCSC source code. You can download the program from here, and you will need to prepare a three line “.hg.conf” file in your home directory.
.hg.conf should contain the exact following lines:

db.host=genome-mysql.cse.ucsc.edu
db.user=genomep
db.password=password

You should download the related .wib file from UCSC.  For human hg18 assembly they are located at ftp://hgdownload.cse.ucsc.edu/gbdb/hg18/ . Then do “chmod 600 .hg.conf ” on the file. Now we are ready to use hgWiggle. See this wiki page for all the other details.

As an example, you can get the statistics for chr16 for phastCons scores as shown below.

hgWiggle -db=hg18 -chr=chr16 -doStats phastCons44wayPlacental

you can also give a bed file with genomic coordinates using -bedFile  option, then you will get phastCons summary for the regions in the bed file.

It is also pretty straight forward to call the hgWiggle from R using system() function, so you can include this functionality in your R code. Although I intended to do that here when I started the post, I thought that it is trivial to use the system() command and call any command line tool, so I won’t be doing that for now. I might add it later on when I have more time.