QTL Analysis in R

Posted on August 13, 2011 by Matt Bogard in R bloggers | 0 Comments

[This article was first published on Econometric Sense, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

See also: Part 1: QTL Analysis and Quantitative Genetics Part 2: Statistical Methods for QTL Analysis

The ‘qtl’ package in R allows you to implement QTL analysis using the methods I’ve previously discussed. The code below is adapted from Broman’s documentation “A Brief Tour of R/qtl.” ( http://www.rqtl.org/tutorials/rqtltour.pdf ) My code (which follows) is an even briefer tour, relating specifically to the general topics in QTL analysis that I have previously discussed in Part 1 and 2 of this 3 part series of posts. The data set is a simulated backcross data set. The ‘summary’ function provides some basic descriptive data, such as the number of individuals in the crosses, the number of mapped chromosomes, the number of phenotypes, and number of mapped markers.

Ex:
   Backcross

    No. individuals:    400

    No. phenotypes:     4
    Percent phenotyped: 100 100 100 100

    No. chromosomes:    19
        Autosomes:      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

    Total markers:      91
    No. markers:        7 5 3 4 10 5 4 4 6 4 4 5 11 3 3 4 4 2 3
    Percent genotyped: 72.7
    Genotypes (%):      AA:50.1 AB:49.9

The plot function (see output below) provides 1) a genetic map with markers 2) phenotype distributions 3) missing data visualizations:

Recall, from the end of my discussion in Part 1 and my discussion in Part 2 I stated:

For maximum likelihood estimation, the following probability density of y_i can be stated as:

f(y_i) = Pr(y_i|Z_i=H_k) ‘ the probability of phenotype ‘y’ given genotype H_k’

= (1/ √ 2π σ) exp[ 1/2 σ ² (y_i-X_iβ +H_kγ )²

The hypothesis H₀: γ =0 can be tested using the likelihood ratio test:

λ = -2(L₀-L₁)

where L₀ represents the likelihood under a restricted model. This is equivalent to the general notion presented in Broman (1997) and my previous post:

LOD = Likelihood (effect occurs by QTL linkage) / Likelihood(effect occurs by chance)

The calc.genoprob, scanone, and sim.geno functions allow you to make these calculations with options for expectation maximization, Haley-Knott regression, or multiple imputation. ‘summary’ and ‘max’ functions allow you to identify the locations on the chromosomes where the LOD score is maximized and infer the potential location of a QTL. The plot function allows you to plot this data for specified chromosomes.

R Code: (note this code scrolls from left to right- click on the code and use the arrow keys or use the scroll bar at the bottom of the post )

#   ------------------------------------------------------------------

#  |PROGRAM NAME: EX_QTL_R

#  |DATE: 8-13-11

#  |CREATED BY: MATT BOGARD 

#  |PROJECT FILE:              

#  |----------------------------------------------------------------

#  | PURPOSE: DEMONSTRATE STATISTICAL METHODS FOR QTL ANALYSIS IN R        

#  | 

#  |------------------------------------------------------------------

#  | REFERENCE:  R code for "A brief tour of R/qtl"

#  |  Karl W Broman, kbroman.wisc.edu http://www.rqtl.org

#  |-----------------------------------------------------------------

 

 

library(qtl) # call package qtl

 

ls() 

 

############################################################

#  exploring backcross data

############################################################

 

 

data(fake.bc) # load simulated backcross data (default data in qtl package)

ls()

 

summary(fake.bc) # summary of info in fake.bc which is and object of class 'cross'

 

# you can also get this information with specific function calls:

nind(fake.bc)   # number of individuals

nphe(fake.bc)   # number of phenotypes

nchr(fake.bc)   # number of chromosomes

totmar(fake.bc) # number of total  markers

nmar(fake.bc)   # list markers? 

 

############################################################

#   plotting maps, genotypes, phenotypes, and markers

###########################################################

 

 

plot(fake.bc) # gives plots data

 

# you can also call for the plots individually:

plot.missing(fake.bc) # just plot missing genotypes

plot.map(fake.bc)     # just plot the genetic map 

plot.pheno(fake.bc, pheno.col=1) # just plot phenotype 1 'phe 1' 

plot.map(fake.bc, chr=c(1, 4, 6, 7, 15), show.marker.names=TRUE) # specific chromosomes and marker names

plot.missing(fake.bc, reorder=TRUE) # order NA genotypes based on value of phenotype

 

fake.bc <- drop.nullmarkers(fake.bc)  # drop obs with missing genotypes

totmar(fake.bc) # total # markers left

 

 

##################################################################

#  specifying and estimating the likelihood used for QTL mapping

#################################################################

 

# From Broman:

# The function calc.genoprob calculates QTL genotype probabilities, conditional

# on the available marker data. These are needed for most of the QTL mapping 

# functions. The argument step indicates the step size (in cM) at which the

# probabilities are calculated, and determines the step size at which later 

# LOD scores are calculated.

 

 

fake.bc <- calc.genoprob(fake.bc, step=1, error.prob=0.01)

 

# function scanone performs single-QTL genome scan with a normal model. 

#  methods include: maximum likelihood via the EM algorithm 

#  and Haley-Knott regression 

 

out.em <- scanone(fake.bc)

out.hk <- scanone(fake.bc, method="hk")

 

# multiple imputation method using sim.geno utilizing the joint 

# genotype distribution, given the observed marker data.

 

fake.bc <- sim.geno(fake.bc, step=2, n.draws=16, error.prob=0.01)

out.imp <- scanone(fake.bc, method="imp")

 

# get the maximum LOD score on each chromosome 

# can also specify a threshold for LOD

 

summary(out.em) 

summary(out.em, threshold=3)

summary(out.hk, threshold=3)

summary(out.imp, threshold=3)

 

# function max.scanone returns just the highest peak from output of scanone.

max(out.em) # based on expectation maximization

max(out.hk) # based on Haley-Knott regression

max(out.imp) # based on multiple imputation

 

##################################################################

#  plot LOD scores by chromosome for QTL mapping

#################################################################

 

 

plot(out.em, chr=c(2,5)) # just based on em method

plot(out.em, out.hk, out.imp, chr=c(2,5)) # all methods

plot(out.em, chr=c(2)) # zoom in on specified chromosome

To leave a comment for the author, please follow the link and comment on their blog: Econometric Sense.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

QTL Analysis in R

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)