The following gist will help the researchers in creating the gencode transcript database using the bioconductor packages. I am assuming that the user’s computer has preinstalled packages “GenomicRanges” and “GenomicFeatures”. Following script has the following information:
- loads the needs bioconductor packages
- gives information about creating the intermediate files needed for generating the database
- brief explanation about each step in the procedure
- create the transcript database, saving and loading when needed
- extract information for each feature (gene, cds,transcript,exon,intron,intergenic regions) as ‘GRanges’ object, ‘sort’ when needed.
- saves all the extracted features into combined object to be loaded in future