Using R: writing a table with odd lines (GFF track headers)

January 28, 2013

(This article was first published on There is grandeur in this view of life » R, and kindly contributed to R-bloggers)

The other day, I wanted to add track lines to a GFF file, so that I could view different features as separate custom tracks in a genome browser. The need to shuffle genome coordinates between different file formats seems to occur all the time when you deal with some kind of bioinformatic data. It’s usually just text files; one just has to keep track of whether the positions should start on 0 or 1 and whether the end should include the last base or not . . .

> head(gff)

  seqname         source        feature     start       end score strand
1       5 protein_coding           mRNA 169010747 169031776     .      +
2       5 protein_coding        protein 169015421 169021641     .      +
3       5 protein_coding five_prime_UTR 169010747 169010893     .      +
4       5 protein_coding five_prime_UTR 169015398 169015420     .      +
5       5 protein_coding            CDS 169015421 169015579     .      +
6       5 protein_coding            CDS 169018052 169018228     .      +
  frame                                                     group
1     . ID=ENST00000504258;Name=CCDC99-005;Parent=ENSG00000040275
2     . ID=ENSP00000421249;Name=CCDC99-005;Parent=ENST00000504258
3     .                                    Parent=ENST00000504258
4     .                                    Parent=ENST00000504258
5     0                    Name=CDS:CCDC99;Parent=ENST00000504258
6     0                    Name=CDS:CCDC99;Parent=ENST00000504258

The above example consists of a few lines from the Ensembl human database, not the actual tracks I was interested in. Anyway, this is what I did: instead of using write.table() directly, explicitly open a file for writing, first write some track line, then write the relevant subset, and repeat.

tracks <- unique(gff$feature)
connection <- file("separate_tracks.gff", "w")
for (k in 1:length(tracks)) {
  writeLines(paste("track name=", tracks[k], sep=""), connection)
  write.table(subset(gff, feature==tracks[k]),
              sep="\t", row.names=F, col.names=F,
              quote=F, file=connection)

Postat i:med mera Tagged: gff, R

To leave a comment for the author, please follow the link and comment on their blog: There is grandeur in this view of life » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)