**R on Guangchuang YU**, and kindly contributed to R-bloggers)

I am using dotplot() to visualize results from enrichGO(), enrichDO(), enricher() and compareCluster() in clusterProfiler R package. When specifying showCategory, I get the right number of categories except with the results of compareCluser().

In my case, I use compareCluster() on a list of 3 elements:

`str(ClusterList) List of 3 $ All : chr [1:1450] "89886" "29923" "100132891" "101410536" ... $ g1 : chr [1:858] "89886" "29923" "100132891" "101410536" ... $ g2: chr [1:592] "5325" "170691" "29953" "283392" ... CompareGO_BP=compareCluster(ClusterList, fun="enrichGO", pvalueCutoff=0.01, pAdjustMethod="BH", OrgDb=org.Hs.eg.db,ont="BP",readable=T) dotplot(CompareGO_BP, showCategory=10, title="GO - Biological Process")`

I ask for 10 categories, but I get 15 categories in All, 8 categories in g1 and 12 categories in g2. None of the categories, neither the sum of the categories are 10…

Is the option showCategory working in the case of comparison? Am I missing something here?

And which categories precisely will it plot? the most significant whatever my 3 cases or the most significant of each case?

The question was posted in Bioconductor support site. It seems quite confusing and I think I need to write a post to clarify it.

In this case, we expect to get `<= 10 of most significant categories`

for each `Cluster`

as the user specify `showCategory=10`

. This is exactly the way of `dotplot`

and `barplot`

methods do for `compareClusterResult`

. `8`

categories for `g1`

is obvious that there are only `8`

enriched terms found for `g1`

.

But why there are `15`

for `All`

and `12`

for `g2`

?

`dotplot`

and `barplot`

methods implemented in `clusterProfiler`

try to make the comparison among clusters more informative and reasonable. After extracting e.g. 10 categories for each cluster, `clusterProfiler`

try to collect overlap of these categories among clusters.

For example, `term A`

is enriched in all the clusters (`All`

, `g1`

and `g2`

) and is in the 10 most significant categories for `g1`

but not `All`

and `g2`

. `clusterProfiler`

try to capture these information and include `term A`

in `All`

and `g2`

clusters and making the comparison in `dotplot`

/`barplot`

more reasonable.

If users want to ignore these information, they can use `dotplot(..., includeAll=FALSE)`

, which is actually not recommended.

See the difference to the following example:

```
library(clusterProfiler)
data(gcSample)
x=compareCluster(gcSample, fun='enrichDO')
dotplot(x, showCategory=5, includeAll=FALSE)
dotplot(x, showCategory=5)
```

In the first figure, it seems that all these clusters are very different, with no overlapping terms enriched. This is not true as demonstrated in the second figure.

## Citation

** Yu G**, Wang L, Han Y and He Q

^{*}. clusterProfiler: an R package for comparing biological themes among gene clusters.

**. 2012, 16(5):284-287.**

*OMICS: A Journal of Integrative Biology*

**leave a comment**for the author, please follow the link and comment on their blog:

**R on Guangchuang YU**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...