Create a GENCODE transcript database in R

February 25, 2014

(This article was first published on Chitka, and kindly contributed to R-bloggers)

The following gist will help the researchers in creating the gencode transcript database using the bioconductor packages. I am assuming that the user’s computer has preinstalled packages “GenomicRanges” and “GenomicFeatures”. Following script has the following information:

  • loads the needs bioconductor packages
  • gives information about creating the intermediate files needed for generating the database
  • brief explanation about each step in the procedure
  • create the transcript database, saving and loading when needed
  • extract information for each feature (gene, cds,transcript,exon,intron,intergenic regions) as ‘GRanges’ object, ‘sort’ when needed.
  • saves all the extracted features into combined object to be loaded in future

To leave a comment for the author, please follow the link and comment on their blog: Chitka. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)