Retrieving transcriptome sequences for RNASeq analysis

[This article was first published on Recipes, scripts and genomics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One approach for analyzing RNASeq data from an organism with a well-annotated genome, is to align the reads to mRNA (cDNA) sequences instead of the genome. To do that you need to extract the transcript sequences from a database. This is how to extract ensembl transcript sequences from UCSC from within R:

_________________________________________________

library(GenomicFeatures)
library(BSgenome.Hsapiens.UCSC.hg18)

tr <- makeTranscriptDbFromUCSC(genome="hg18", tablename="ensGene")
tr_seq <- extractTranscriptsFromGenome(Hsapiens, tr)
write.XStringSet(tr_seq, file=”hg18.ensgene.transcripts.fasta”, ‘fasta’, width=80, append=F)
_________________________________________________
Next steps can be to build a reference index for bowtie, perform the alignment, and count the number of reads aligned in R using table(). Differential expression analysis may be performed by DESeq.

To leave a comment for the author, please follow the link and comment on their blog: Recipes, scripts and genomics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)