# Using R: writing a table with odd lines (GFF track headers)

January 28, 2013
By

(This article was first published on There is grandeur in this view of life » R, and kindly contributed to R-bloggers)

The other day, I wanted to add track lines to a GFF file, so that I could view different features as separate custom tracks in a genome browser. The need to shuffle genome coordinates between different file formats seems to occur all the time when you deal with some kind of bioinformatic data. It’s usually just text files; one just has to keep track of whether the positions should start on 0 or 1 and whether the end should include the last base or not . . .

  seqname         source        feature     start       end score strand
1       5 protein_coding           mRNA 169010747 169031776     .      +
2       5 protein_coding        protein 169015421 169021641     .      +
3       5 protein_coding five_prime_UTR 169010747 169010893     .      +
4       5 protein_coding five_prime_UTR 169015398 169015420     .      +
5       5 protein_coding            CDS 169015421 169015579     .      +
6       5 protein_coding            CDS 169018052 169018228     .      +
frame                                                     group
1     . ID=ENST00000504258;Name=CCDC99-005;Parent=ENSG00000040275
2     . ID=ENSP00000421249;Name=CCDC99-005;Parent=ENST00000504258
3     .                                    Parent=ENST00000504258
4     .                                    Parent=ENST00000504258
5     0                    Name=CDS:CCDC99;Parent=ENST00000504258
6     0                    Name=CDS:CCDC99;Parent=ENST00000504258

The above example consists of a few lines from the Ensembl human database, not the actual tracks I was interested in. Anyway, this is what I did: instead of using write.table() directly, explicitly open a file for writing, first write some track line, then write the relevant subset, and repeat.

tracks <- unique(gff\$feature)
connection <- file("separate_tracks.gff", "w")
for (k in 1:length(tracks)) {
writeLines(paste("track name=", tracks[k], sep=""), connection)
write.table(subset(gff, feature==tracks[k]),
sep="\t", row.names=F, col.names=F,
quote=F, file=connection)
}
close(connection)


Postat i:med mera Tagged: gff, R