Exploring the human genome (Part 2) – Transcripts

[This article was first published on Shirin's playgRound, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How many transcripts and proteins do genes have?

In Exploring the human genome (Part 1) – Gene Annotations I examined Ensembl, Entrez and HGNC gene annotations with AnnotationDbi via three R packages: org.Hs.eg.db, EnsDb.Hsapiens.v79 and TxDb.Hsapiens.UCSC.hg38.knownGene.

Now, I want to know how many transcripts there are for genes in these databases.

What is a transcript?

While a gene is defined as a unit of DNA information which encodes for the production of a protein, it is really more a concept than an actual physical unit. Human genes conists of exons and introns, which can often be transcribed in different combinations – a process called alternative splicing.

“While the concept of a gene has been helpful in defining the relationship of a portion of a genome to a phenotype, this traditional term may not be as useful as it once was. Currently, “gene” has come to refer principally to a genomic region producing a polyadenylated mRNA that encodes a protein. However, the recent emergence of a large collection of unannotated transcripts with apparently little protein coding capacity, collectively called transcripts of unknown function (TUFs), has begun to blur the physical boundaries and genomic organization of genic regions with noncoding transcripts often overlapping protein-coding genes on the same (sense) and opposite strand (antisense). Moreover, they are often located in intergenic regions, making the genic portions of the human genome an interleaved network of both annotated polyadenylated and nonpolyadenylated transcripts, including splice variants with novel 5′ ends extending hundreds of kilobase. This complex transcriptional organization and other recently observed features of genomes argue for the reconsideration of the term “gene” and suggests that transcripts may be used to define the operational unit of a genome.” Thomas Gingeras, Genome Res. 2007

org.Hs.eg.db

With org.Hs.eg.db I am using Entrez and Ensembl IDs to obtain Ensembl transcript IDs, which show splice variants of a gene, including protein-coding and non-coding transcripts.

<span class="n">library</span><span class="p">(</span><span class="n">AnnotationDbi</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">org.Hs.eg.db</span><span class="p">)</span><span class="w">

</span><span class="n">ENTREZID_org</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">keys</span><span class="p">(</span><span class="n">org.Hs.eg.db</span><span class="p">,</span><span class="w"> </span><span class="n">keytype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ENTREZID"</span><span class="p">)</span><span class="w">
</span><span class="n">ENSEMBL_org</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">keys</span><span class="p">(</span><span class="n">org.Hs.eg.db</span><span class="p">,</span><span class="w"> </span><span class="n">keytype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ENSEMBL"</span><span class="p">)</span><span class="w">

</span><span class="c1"># Summarize number of transcripts per gene Entrez ID
</span><span class="n">org_trans_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">AnnotationDbi</span><span class="o">::</span><span class="n">select</span><span class="p">(</span><span class="n">org.Hs.eg.db</span><span class="p">,</span><span class="w"> </span><span class="n">keys</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ENTREZID_org</span><span class="p">,</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"ENSEMBLTRANS"</span><span class="p">),</span><span class="w"> </span><span class="n">keytype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ENTREZID"</span><span class="p">)</span><span class="w">
</span><span class="n">org_transcript_num_table_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">table</span><span class="p">(</span><span class="n">org_trans_entrez</span><span class="o">$</span><span class="n">ENTREZID</span><span class="p">))</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">org_transcript_num_table_entrez</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Entrez"</span><span class="p">,</span><span class="w"> </span><span class="s2">"orgDb"</span><span class="p">)</span><span class="w">

</span><span class="c1"># Summarize number of transcripts per gene Ensembl ID
</span><span class="n">org_trans_ensembl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">AnnotationDbi</span><span class="o">::</span><span class="n">select</span><span class="p">(</span><span class="n">org.Hs.eg.db</span><span class="p">,</span><span class="w"> </span><span class="n">keys</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ENSEMBL_org</span><span class="p">,</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"ENSEMBLTRANS"</span><span class="p">),</span><span class="w"> </span><span class="n">keytype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ENSEMBL"</span><span class="p">)</span><span class="w">
</span><span class="n">org_transcript_num_table_ensembl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">table</span><span class="p">(</span><span class="n">org_trans_ensembl</span><span class="o">$</span><span class="n">ENSEMBL</span><span class="p">))</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">org_transcript_num_table_ensembl</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Ensembl"</span><span class="p">,</span><span class="w"> </span><span class="s2">"orgDb"</span><span class="p">)</span><span class="w">

</span><span class="c1"># how many NAs are in each column?
</span><span class="n">sapply</span><span class="p">(</span><span class="n">org_trans_entrez</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="nf">is.na</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span><span class="w">
</span>
##     ENTREZID ENSEMBLTRANS 
##            0        52081
<span class="n">sapply</span><span class="p">(</span><span class="n">org_trans_ensembl</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="nf">is.na</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span><span class="w">
</span>
##      ENSEMBL ENSEMBLTRANS 
##            0        17678
<span class="n">head</span><span class="p">(</span><span class="n">org_trans_entrez</span><span class="p">)</span><span class="w">
</span>
##   ENTREZID    ENSEMBLTRANS
## 1        1            <NA>
## 2        2            <NA>
## 3        3 ENST00000543404
## 4        3 ENST00000566278
## 5        3 ENST00000545343
## 6        3 ENST00000544183
<span class="n">head</span><span class="p">(</span><span class="n">org_trans_ensembl</span><span class="p">)</span><span class="w">
</span>
##           ENSEMBL    ENSEMBLTRANS
## 1 ENSG00000121410            <NA>
## 2 ENSG00000175899            <NA>
## 3 ENSG00000256069 ENST00000543404
## 4 ENSG00000256069 ENST00000566278
## 5 ENSG00000256069 ENST00000545343
## 6 ENSG00000256069 ENST00000544183

Strangely, some genes, like A1BG (Entrez ID 1) are listed with one gene ID but no Ensembl transcript ID. This is weird, especially since I happen to know for this particular gene that it has several transcripts. Also, because each gene must have at least one transcript, all NAs are counted as 1 transcript in the summary table.

Let’s check other databases…

TxDb.Hsapiens.UCSC.hg38.knownGene

TxDb.Hsapiens.UCSC.hg38.knownGene only has Entrez IDs to identify genes and UCSC transcript ID to identify transcripts.

<span class="n">library</span><span class="p">(</span><span class="n">TxDb.Hsapiens.UCSC.hg38.knownGene</span><span class="p">)</span><span class="w">

</span><span class="n">ENTREZID_TxDb</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">keys</span><span class="p">(</span><span class="n">TxDb.Hsapiens.UCSC.hg38.knownGene</span><span class="p">,</span><span class="w"> </span><span class="n">keytype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"GENEID"</span><span class="p">)</span><span class="w">
</span><span class="n">TxDb_trans</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">AnnotationDbi</span><span class="o">::</span><span class="n">select</span><span class="p">(</span><span class="n">TxDb.Hsapiens.UCSC.hg38.knownGene</span><span class="p">,</span><span class="w"> </span><span class="n">keys</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ENTREZID_TxDb</span><span class="p">,</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"TXID"</span><span class="p">),</span><span class="w"> </span><span class="n">keytype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"GENEID"</span><span class="p">)</span><span class="w">

</span><span class="c1"># Summarize number of transcripts per gene Entrez ID
</span><span class="n">TxDb_transcript_num_table_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">table</span><span class="p">(</span><span class="n">TxDb_trans</span><span class="o">$</span><span class="n">GENEID</span><span class="p">))</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">TxDb_transcript_num_table_entrez</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Entrez"</span><span class="p">,</span><span class="w"> </span><span class="s2">"TxDb"</span><span class="p">)</span><span class="w">

</span><span class="c1"># how many NAs are in each column?
</span><span class="n">sapply</span><span class="p">(</span><span class="n">TxDb_trans</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="nf">is.na</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span><span class="w">
</span>
## GENEID   TXID 
##      0      0
<span class="n">head</span><span class="p">(</span><span class="n">TxDb_trans</span><span class="p">)</span><span class="w">
</span>
##   GENEID   TXID
## 1      1 166436
## 2      1 166437
## 3      1 166438
## 4      1 166439
## 5      1 166440
## 6      1 166441

Here, we don’t have NAs in the data and in contrast to org.Hs.eg.db we find at least six transcripts for A1BG.

EnsDb.Hsapiens.v79

Like org.Hs.eg.db EnsDb.Hsapiens.v79 has Entrez and Ensembl IDs in reference to Ensembl transcript IDs.

<span class="n">library</span><span class="p">(</span><span class="n">EnsDb.Hsapiens.v79</span><span class="p">)</span><span class="w">

</span><span class="n">ENSEMBL_EnsDb</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">keys</span><span class="p">(</span><span class="n">EnsDb.Hsapiens.v79</span><span class="p">,</span><span class="w"> </span><span class="n">keytype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"GENEID"</span><span class="p">)</span><span class="w">
</span><span class="n">ENTREZ_EnsDb</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">keys</span><span class="p">(</span><span class="n">EnsDb.Hsapiens.v79</span><span class="p">,</span><span class="w"> </span><span class="n">keytype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ENTREZID"</span><span class="p">)</span><span class="w">

</span><span class="n">EnsDb_trans_ensembl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ensembldb</span><span class="o">::</span><span class="n">select</span><span class="p">(</span><span class="n">EnsDb.Hsapiens.v79</span><span class="p">,</span><span class="w"> </span><span class="n">keys</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ENSEMBL_EnsDb</span><span class="p">,</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"TXID"</span><span class="p">),</span><span class="w"> </span><span class="n">keytype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"GENEID"</span><span class="p">)</span><span class="w">
</span><span class="n">EnsDb_trans_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ensembldb</span><span class="o">::</span><span class="n">select</span><span class="p">(</span><span class="n">EnsDb.Hsapiens.v79</span><span class="p">,</span><span class="w"> </span><span class="n">keys</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ENTREZ_EnsDb</span><span class="p">,</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"TXID"</span><span class="p">),</span><span class="w"> </span><span class="n">keytype</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ENTREZID"</span><span class="p">)</span><span class="w">

</span><span class="c1"># somehow there are empty fields in the Entrez ID column, replacing them with NA
</span><span class="n">EnsDb_trans_entrez</span><span class="p">[</span><span class="n">EnsDb_trans_entrez</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">""</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">NA</span><span class="w">

</span><span class="c1"># how many NAs are in each column?
</span><span class="n">sapply</span><span class="p">(</span><span class="n">EnsDb_trans_entrez</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="nf">is.na</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span><span class="w">
</span>
## ENTREZID     TXID 
##    47828        0
<span class="c1"># and removing NA rows
</span><span class="n">EnsDb_trans_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">EnsDb_trans_entrez</span><span class="p">[</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">EnsDb_trans_entrez</span><span class="o">$</span><span class="n">ENTREZID</span><span class="p">),</span><span class="w"> </span><span class="p">]</span><span class="w">
</span>
<span class="c1"># order by Entrez ID to compare with other databases
</span><span class="n">EnsDb_trans_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">EnsDb_trans_entrez</span><span class="p">[</span><span class="n">order</span><span class="p">(</span><span class="n">EnsDb_trans_entrez</span><span class="o">$</span><span class="n">ENTREZID</span><span class="p">),]</span><span class="w">

</span><span class="n">head</span><span class="p">(</span><span class="n">EnsDb_trans_ensembl</span><span class="p">)</span><span class="w">
</span>
##            GENEID            TXID
## 1 ENSG00000000003 ENST00000373020
## 2 ENSG00000000003 ENST00000496771
## 3 ENSG00000000003 ENST00000494424
## 4 ENSG00000000003 ENST00000612152
## 5 ENSG00000000003 ENST00000614008
## 6 ENSG00000000005 ENST00000373031
<span class="n">head</span><span class="p">(</span><span class="n">EnsDb_trans_entrez</span><span class="p">)</span><span class="w">
</span>
##        ENTREZID            TXID
## 95915         1 ENST00000596924
## 95916         1 ENST00000263100
## 95917         1 ENST00000595014
## 95918         1 ENST00000598345
## 95919         1 ENST00000600966
## 135994       10 ENST00000286479

Here, we find five transcripts for A1BG.

<span class="c1"># Summarize number of transcripts per gene Entrez ID
</span><span class="n">EnsDb_transcript_num_table_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">table</span><span class="p">(</span><span class="n">EnsDb_trans_entrez</span><span class="o">$</span><span class="n">ENTREZID</span><span class="p">))</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">EnsDb_transcript_num_table_entrez</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Entrez"</span><span class="p">,</span><span class="w"> </span><span class="s2">"EnsDb"</span><span class="p">)</span><span class="w">

</span><span class="c1"># Summarize number of transcripts per gene Ensembl ID
</span><span class="n">EnsDb_transcript_num_table_ensembl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">table</span><span class="p">(</span><span class="n">EnsDb_trans_ensembl</span><span class="o">$</span><span class="n">GENEID</span><span class="p">))</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">EnsDb_transcript_num_table_ensembl</span><span class="p">)</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Ensembl"</span><span class="p">,</span><span class="w"> </span><span class="s2">"EnsDb"</span><span class="p">)</span><span class="w">

</span><span class="c1"># In the Entrez column, there are some with multiple entries
# divide entries with multiple gene names into one row per gene/ entry
</span><span class="n">head</span><span class="p">(</span><span class="n">EnsDb_transcript_num_table_entrez</span><span class="p">[</span><span class="n">grep</span><span class="p">(</span><span class="s2">";"</span><span class="p">,</span><span class="w"> </span><span class="n">EnsDb_transcript_num_table_entrez</span><span class="o">$</span><span class="n">Entrez</span><span class="p">),</span><span class="w"> </span><span class="p">])</span><span class="w">
</span>
##                                     Entrez EnsDb
## 13                     100033415;100033421     1
## 15                     100033417;100033419     2
## 18                     100033421;100033415     1
## 40                     100033446;100033449     1
## 42 100033448;100033803;100033817;100033810     2
## 43                     100033449;100033446     1
<span class="n">library</span><span class="p">(</span><span class="n">splitstackshape</span><span class="p">)</span><span class="w">
</span><span class="n">out</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">cSplit</span><span class="p">(</span><span class="n">EnsDb_transcript_num_table_entrez</span><span class="p">,</span><span class="w"> </span><span class="n">splitCols</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Entrez"</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">";"</span><span class="p">,</span><span class="w"> </span><span class="n">direction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"long"</span><span class="p">),</span><span class="w"> </span><span class="n">stringsAsFactors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">out</span><span class="o">$</span><span class="n">Entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">as.character</span><span class="p">(</span><span class="n">out</span><span class="o">$</span><span class="n">Entrez</span><span class="p">)</span><span class="w">

</span><span class="c1"># remove duplicates and take the mean
</span><span class="n">library</span><span class="p">(</span><span class="n">plyr</span><span class="p">)</span><span class="w">
</span><span class="n">EnsDb_transcript_num_table_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ddply</span><span class="p">(</span><span class="n">out</span><span class="p">,</span><span class="w"> </span><span class="s2">"Entrez"</span><span class="p">,</span><span class="w"> </span><span class="n">numcolwise</span><span class="p">(</span><span class="n">mean</span><span class="p">))</span><span class="w">
</span>

Comparison of transcript numbers per database package

The majority of genes have only one transcript. The number of genes with more transcripts decreases with number of transcripts; this can be seen in the plots below.

Entrez IDs

<span class="c1"># merging datasets by Entrez ID
</span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="n">transcript_num_table_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">full_join</span><span class="p">(</span><span class="n">org_transcript_num_table_entrez</span><span class="p">,</span><span class="w"> </span><span class="n">TxDb_transcript_num_table_entrez</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Entrez"</span><span class="p">)</span><span class="w">
</span><span class="n">transcript_num_table_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">full_join</span><span class="p">(</span><span class="n">transcript_num_table_entrez</span><span class="p">,</span><span class="w"> </span><span class="n">EnsDb_transcript_num_table_entrez</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Entrez"</span><span class="p">)</span><span class="w">

</span><span class="c1"># gather for plotting
</span><span class="n">library</span><span class="p">(</span><span class="n">tidyr</span><span class="p">)</span><span class="w">
</span><span class="n">transcript_num_table_gather_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">transcript_num_table_entrez</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">gather</span><span class="p">(</span><span class="n">DB</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">,</span><span class="w"> </span><span class="n">orgDb</span><span class="o">:</span><span class="n">EnsDb</span><span class="p">)</span><span class="w">

</span><span class="c1"># How many counts are NA?
</span><span class="n">sapply</span><span class="p">(</span><span class="n">transcript_num_table_gather_entrez</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="nf">is.na</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span><span class="w">
</span>
## Entrez     DB  count 
##      0      0  71066
<span class="c1"># removing rows with NA counts
</span><span class="n">transcript_num_table_gather_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">transcript_num_table_gather_entrez</span><span class="p">[</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">count</span><span class="p">),</span><span class="w"> </span><span class="p">]</span><span class="w">

</span><span class="c1"># because there are only a handful of genes with many transcripts, they can't be plotted together with genes with few transcripts
# separating them in high and low
</span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">count</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">25</span><span class="p">,</span><span class="w"> </span><span class="s2">"high"</span><span class="p">,</span><span class="w"> </span><span class="s2">"low"</span><span class="p">)</span><span class="w">

</span><span class="c1"># setting factor levels
</span><span class="n">f</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"low"</span><span class="p">,</span><span class="w"> </span><span class="s2">"high"</span><span class="p">)</span><span class="w">
</span><span class="n">transcript_num_table_gather_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">within</span><span class="p">(</span><span class="n">transcript_num_table_gather_entrez</span><span class="p">,</span><span class="w"> </span><span class="n">group</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">group</span><span class="p">,</span><span class="w"> </span><span class="n">levels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">f</span><span class="p">))</span><span class="w">
</span>
<span class="c1"># setting my custom theme of choice
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">

</span><span class="n">my_theme</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sans"</span><span class="p">){</span><span class="w">
  </span><span class="n">theme_grey</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">base_size</span><span class="p">,</span><span class="w"> </span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">base_family</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme</span><span class="p">(</span><span class="w">
    </span><span class="n">axis.text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">),</span><span class="w">
    </span><span class="n">axis.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">14</span><span class="p">),</span><span class="w">
    </span><span class="n">panel.grid.major</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_line</span><span class="p">(</span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"grey"</span><span class="p">),</span><span class="w">
    </span><span class="n">panel.grid.minor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
    </span><span class="n">panel.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"aliceblue"</span><span class="p">),</span><span class="w">
    </span><span class="n">strip.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_rect</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lightgrey"</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"grey"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w">
    </span><span class="n">strip.text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">face</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"bold"</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">,</span><span class="w"> </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"navy"</span><span class="p">),</span><span class="w">
    </span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"bottom"</span><span class="p">,</span><span class="w">
    </span><span class="n">panel.margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unit</span><span class="p">(</span><span class="m">.05</span><span class="p">,</span><span class="w"> </span><span class="s2">"lines"</span><span class="p">),</span><span class="w">
    </span><span class="n">panel.border</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_rect</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"grey"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span>
<span class="n">p</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">transcript_num_table_gather_entrez</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">count</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"deepskyblue4"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">my_theme</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Histogram of number of transcripts per gene (Entrez ID)"</span><span class="p">,</span><span class="w"> 
       </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Number of transcripts per gene"</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Count"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">facet_wrap</span><span class="p">(</span><span class="n">DB</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">group</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free"</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">

</span><span class="n">ann_text_entrez</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="m">300</span><span class="p">,</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="m">100</span><span class="p">),</span><span class="w">
                       </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">4000</span><span class="p">,</span><span class="w"> </span><span class="m">350</span><span class="p">,</span><span class="w"> </span><span class="m">40000</span><span class="p">,</span><span class="w"> </span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="m">5000</span><span class="p">,</span><span class="w"> </span><span class="m">280</span><span class="p">),</span><span class="w">
                       </span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"low"</span><span class="p">,</span><span class="w"> </span><span class="s2">"high"</span><span class="p">),</span><span class="w"> </span><span class="m">3</span><span class="p">),</span><span class="w">
                       </span><span class="n">DB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"EnsDb"</span><span class="p">,</span><span class="w"> </span><span class="s2">"orgDb"</span><span class="p">,</span><span class="w"> </span><span class="s2">"TxDb"</span><span class="p">),</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w">
                       </span><span class="n">labs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">paste</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"low"</span><span class="w"> </span><span class="o">&</span><span class="w"> 
                                                     </span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">DB</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"EnsDb"</span><span class="p">))),</span><span class="w"> 
                                </span><span class="n">paste</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"high"</span><span class="w"> </span><span class="o">&</span><span class="w"> 
                                                     </span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">DB</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"EnsDb"</span><span class="p">))),</span><span class="w">
                                </span><span class="n">paste</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"low"</span><span class="w"> </span><span class="o">&</span><span class="w"> 
                                                     </span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">DB</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"orgDb"</span><span class="p">))),</span><span class="w"> 
                                </span><span class="n">paste</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"high"</span><span class="w"> </span><span class="o">&</span><span class="w"> 
                                                     </span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">DB</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"orgDb"</span><span class="p">))),</span><span class="w">
                                </span><span class="n">paste</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"low"</span><span class="w"> </span><span class="o">&</span><span class="w"> 
                                                     </span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">DB</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"TxDb"</span><span class="p">))),</span><span class="w"> 
                                </span><span class="n">paste</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"high"</span><span class="w"> </span><span class="o">&</span><span class="w"> 
                                                     </span><span class="n">transcript_num_table_gather_entrez</span><span class="o">$</span><span class="n">DB</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"TxDb"</span><span class="p">)))))</span><span class="w">

</span><span class="n">p</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_text</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ann_text_entrez</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">labs</span><span class="p">,</span><span class="w"> </span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">),</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">8</span><span class="p">)</span><span class="w">
</span>

This time, orgDb has the highest number of Entrez ID gene entries with corresponding transcript information, the majority of which have fewer than 25 transcripts (59984 genes); only 152 genes have more than 25 transcripts. EnsDb and TxDB have comparable numbers of gene entries, also for genes with few (23971 and 24669 genes) and many transcripts (566 and 552 genes).

Ensembl IDs

<span class="c1"># merging datasets by Ensembl ID
</span><span class="n">transcript_num_table_ensembl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">full_join</span><span class="p">(</span><span class="n">org_transcript_num_table_ensembl</span><span class="p">,</span><span class="w"> </span><span class="n">EnsDb_transcript_num_table_ensembl</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Ensembl"</span><span class="p">)</span><span class="w">

</span><span class="c1"># gather for plotting
</span><span class="n">transcript_num_table_gather_ensembl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">transcript_num_table_ensembl</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">gather</span><span class="p">(</span><span class="n">DB</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">,</span><span class="w"> </span><span class="n">orgDb</span><span class="o">:</span><span class="n">EnsDb</span><span class="p">)</span><span class="w">

</span><span class="c1"># How many counts are NA?
</span><span class="n">sapply</span><span class="p">(</span><span class="n">transcript_num_table_gather_ensembl</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="nf">is.na</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span><span class="w">
</span>
## Ensembl      DB   count 
##       0       0   38659
<span class="c1"># removing rows with NA counts
</span><span class="n">transcript_num_table_gather_ensembl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">transcript_num_table_gather_ensembl</span><span class="p">[</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">count</span><span class="p">),</span><span class="w"> </span><span class="p">]</span><span class="w">

</span><span class="c1"># because there are only a handful of genes with many transcripts, they can't be plotted together with genes with few transcripts
# separating them in high and low
</span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">count</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">25</span><span class="p">,</span><span class="w"> </span><span class="s2">"high"</span><span class="p">,</span><span class="w"> </span><span class="s2">"low"</span><span class="p">)</span><span class="w">

</span><span class="c1"># setting factor levels
</span><span class="n">f</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"low"</span><span class="p">,</span><span class="w"> </span><span class="s2">"high"</span><span class="p">)</span><span class="w">
</span><span class="n">transcript_num_table_gather_ensembl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">within</span><span class="p">(</span><span class="n">transcript_num_table_gather_ensembl</span><span class="p">,</span><span class="w"> </span><span class="n">group</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">group</span><span class="p">,</span><span class="w"> </span><span class="n">levels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">f</span><span class="p">))</span><span class="w">
</span>
<span class="n">p</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">transcript_num_table_gather_ensembl</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">count</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"deepskyblue4"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">my_theme</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Histogram of number of transcripts per gene (Ensembl ID)"</span><span class="p">,</span><span class="w"> 
       </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Number of transcripts per gene"</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Count"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">facet_wrap</span><span class="p">(</span><span class="n">DB</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">group</span><span class="p">,</span><span class="w"> </span><span class="n">scales</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"free"</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">

</span><span class="n">ann_text_ensembl</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="m">75</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="m">100</span><span class="p">),</span><span class="w">
                              </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">30000</span><span class="p">,</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="m">15000</span><span class="p">,</span><span class="w"> </span><span class="m">100</span><span class="p">),</span><span class="w">
                              </span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"low"</span><span class="p">,</span><span class="w"> </span><span class="s2">"high"</span><span class="p">),</span><span class="w"> </span><span class="m">4</span><span class="p">),</span><span class="w">
                              </span><span class="n">DB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"EnsDb"</span><span class="p">,</span><span class="w"> </span><span class="s2">"orgDb"</span><span class="p">),</span><span class="w"> </span><span class="n">each</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w">
                              </span><span class="n">labs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">paste</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"low"</span><span class="w"> </span><span class="o">&</span><span class="w"> 
                                                            </span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">DB</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"EnsDb"</span><span class="p">))),</span><span class="w"> 
                                       </span><span class="n">paste</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"high"</span><span class="w"> </span><span class="o">&</span><span class="w"> 
                                                            </span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">DB</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"EnsDb"</span><span class="p">))),</span><span class="w">
                                       </span><span class="n">paste</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"low"</span><span class="w"> </span><span class="o">&</span><span class="w"> 
                                                            </span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">DB</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"orgDb"</span><span class="p">))),</span><span class="w"> 
                                       </span><span class="n">paste</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">which</span><span class="p">(</span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">group</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"high"</span><span class="w"> </span><span class="o">&</span><span class="w"> 
                                                            </span><span class="n">transcript_num_table_gather_ensembl</span><span class="o">$</span><span class="n">DB</span><span class="w"> </span><span class="o">==</span><span class="w">  </span><span class="s2">"orgDb"</span><span class="p">)))))</span><span class="w">

</span><span class="n">p</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_text</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ann_text_ensembl</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">labs</span><span class="p">,</span><span class="w"> </span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">),</span><span class="...

To leave a comment for the author, please follow the link and comment on their blog: Shirin's playgRound.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)