Functional Analysis with freeCount

[This article was first published on R – Myscape, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Overview

Functional analysis is useful for determining the functions of differentially expressed genes. Genes can have multiple functional annotations, so we need to determine which ones are important.

What biological functions are driving the differences in gene expression?

The freeCount FA app will help you perform functional analysis of gene sets, which can be produced from differential expression or network analysis.

Learning Goals

  • Learn how to perform downstream functional analysis with topGO
  • Practice interpreting functional analysis results
  • Understand how to connect genes to functions

Related

This tutorial is the second in a series and directly after the Making DE Gene Lists with freeCount tutorial.

topGO

The topGO R package provides tools for testing gene ontology (GO) terms while accounting for the topology of the GO graph.

One of the main advantages of topGO is the unified gene set testing framework it offers. There are a number of test statistics and algorithms dealing with the GO graph structure that are ready to use in topGO.

Gene Ontology

The gene ontology (GO) is the logical structure describing the full complexity of the biology. The GO terms describe the many different types of molecular functions (MF), pathways carrying out different biological programs (BP), and cellular locations where these occur (CC).

The GO annotations are traceable, evidence-based statements relating a specific gene product to a specific ontology term. The set of all GO annotations associated with a gene provides a description of its biological role.


Before Starting

The exercise in this tutorial will be using the freeCount apps in RStudio on a personal computer. Make sure that you have the following tools downloaded, installed, and up-to-date on your personal computer:

  1. R software environment
  2. RStudio desktop application

For Windows users, additionally install RTools.

It is also possible to run the freeCount apps online through Posit Cloud. To see how, checkout the freeCount Bioinformatics Analysis Apps on Posit Cloud tutorial.

Input Data

  1. Download the tribolium DE gene lists file
  2. Download the tribolium GO term annotations file

Tip! Right click and select Save As… to download the above files in the necessary formats.


The Analysis App

The following steps show you how to get and start running the freeCount functional analysis (FA) app.

  1. Download the freeCount R Shiny applications
    1. Go to https://github.com/ElizabethBrooks/freeCount
    2. Click the green < > Code button
    3. Click Download ZIP
  2. Extract the freeCount-main directory 
  3. Navigate to the apps directory
  4. Open the FA.R file in RStudio
  5. Click Install on the yellow banner to install the necessary R packages (or run the code on lines 10 to 20)
  6. Click the Run App button in the upper right corner of the source pane

Analysis Process

Perform the following steps to make a list of significant GO terms, which describe the biological functions driving the gene expression differences in your experiment.

  1. Upload the data and click Run Analysis
  2. Review the initial settings on the Analysis tab
  3. Explore the data and initial results for each ontology level
  4. Adjust the P-Value, Algorithm, or Test Statistic settings and click Update Analysis
  5. Create a curated list of GO terms by repeating steps 3 and 4
  6. Download the curated list of GO terms

1. Upload Data

Upload the data and click Run Analysis.

Input Data

  1. The first text box is the Statistic for Gene Scoring, which is the statistic used to filter your gene sets to focus on interesting genes (e.g., significantly DE). In this tutorial we need to set this to FDR.
  2. The second text box is the Expression for Gene Scoring that defines how to filter your gene sets using the specified statistic in the first text box. In this tutorial we need to set this to < 0.05.
  3. The first file is the Gene Score Table that has all genes detected in your experiment with gene-wise scores. In this tutorial we are using a tribolium DE gene lists file.
  4. The second file is the Mappings Table with the GO term annotations for your experiment. In this tutorial we are using the tribolium GO term annotations file.

2. Review Initial Settings

Review the initial settings on the Analysis tab.

3. Explore Data

Explore the data and initial results for each ontology level (BP, MF, or CC) on the Exploration and Results tabs.

Checkout the number of significant GO terms for each ontology level on the Exploration tab. The histogram shows the range of p-values, which allows you to see how many GO terms were found to be significant for your list of genes at the current analysis settings.

The table of Results for the Top Significant GO Terms on the Exploration tab shows the most significant GO terms for the selected ontology level, which are sorted by p-value.

Results from the GO term functional analysis may be viewed on the Results tab. The dot plot shows the most significant GO terms for each ontology level.

4. Adjust Settings

Adjust the P-Value, Algorithm, or Test Statistic settings and click Update Analysis.

Filtering Functional Analysis Results

Adjust cut offs by…

  • Decreasing the P-Value to focus on high-likelihood GO terms
  • Changing the Algorithm or Test Statistic to make different assumptions and tests on the data

Narrow down the results to the GO terms associated with biological functions that you think are driving the differences in gene expression and appear to be relevant to your experiment.

Verify the P-Value threshold, Algorithm, and Test Statistic by investigating the resulting GO terms for the different ontology levels. One approach is to search online databases (e.g., QuickGO) for more information about the significant GO terms. It can also be useful to use AI tools to begin exploring the GO term results.

Verify Analysis Settings

Verify that the analysis settings have updated by looking at the Current Analysis Settings on the left side of the app.

5. Create Curated List

Create a curated list of GO terms by repeating steps 3 and 4.

It may be necessary to repeatedly adjust the settings and inspect the functional analysis results to create a manageable list of GO terms relevant to your experiment.

Additionally, you may want to go back and adjust the DE analysis settings to create a more informative set of genes for the functional analysis.

6. Download Results

Download the curated list of GO terms.

The Table of GO Term Results has all of the GO terms in the analysis (significant or not) sorted by p-value.

The Table of Significant GO Term Results has all of the significant GO terms in the analysis sorted by p-value. These GO terms can be searched using the internet or AI tools to help identify the relevant terms for your experiment. Note that it is important to double check the results from AI tools, since they can report erroneous functions for terms.

The Table of Gene IDs for GO Terms has only the gene IDs for each of the GO terms, which can be used in set operations to identify shared or unique sets of genes. This table can be input to the freeCount SO app to perform set operations.

To leave a comment for the author, please follow the link and comment on their blog: R – Myscape.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)