# QTL Analysis in R

August 13, 2011
By

(This article was first published on Econometric Sense, and kindly contributed to R-bloggers)

See also: Part 1: QTL Analysis and Quantitative Genetics  Part 2: Statistical Methods for QTL Analysis

The 'qtl' package in R allows you to implement QTL analysis using the methods I've previously discussed. The code below is adapted from Broman's documentation "A Brief Tour of R/qtl." ( http://www.rqtl.org/tutorials/rqtltour.pdf ) My code (which follows) is an even briefer tour, relating specifically to the general topics in QTL analysis that I have previously discussed in Part 1 and 2 of this 3 part series of posts. The data set is a simulated backcross data set. The 'summary' function provides some basic descriptive data, such as the number of individuals in the crosses, the number of mapped chromosomes, the number of phenotypes, and number of mapped markers.

Ex:
Backcross

No. individuals:    400

No. phenotypes:     4
Percent phenotyped: 100 100 100 100

No. chromosomes:    19
Autosomes:      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Total markers:      91
No. markers:        7 5 3 4 10 5 4 4 6 4 4 5 11 3 3 4 4 2 3
Percent genotyped:  72.7
Genotypes (%):      AA:50.1  AB:49.9

The plot function (see output below) provides 1) a genetic map with markers 2) phenotype distributions 3) missing data visualizations:

Recall, from  the end of my discussion in Part 1 and my discussion in Part 2 I stated:

For maximum likelihood estimation, the following probability density of yi can be stated as:
f(yi) = Pr(y|Zi=Hk)  ‘ the probability of phenotype ‘y’ given genotype H
= (1/ √ 2π σ)  exp[ 1/2 σ 2 (yi-Xiβ +Hkγ )2
The hypothesis H0:  γ =0 can be tested using the likelihood ratio test:
λ = -2(L0-L1)
where L0 represents the likelihood under a restricted model. This is equivalent to the general notion presented in Broman (1997) and my previous post:

LOD = Likelihood (effect occurs by QTL linkage) / Likelihood(effect occurs by chance)
The calc.genoprob, scanone, and sim.geno functions allow you to make these calculations with options for expectation maximization, Haley-Knott regression, or multiple imputation.  'summary' and 'max' functions allow you to identify the locations on the chromosomes where the LOD score is maximized and infer the potential location of a QTL.  The plot function allows you to plot this data for specified chromosomes.

R Code: (note this code scrolls from left to right- click on the code and use the arrow keys or use the scroll bar at the bottom of the post )
#   ------------------------------------------------------------------
#  |PROGRAM NAME: EX_QTL_R
#  |DATE: 8-13-11
#  |CREATED BY: MATT BOGARD
#  |PROJECT FILE:
#  |----------------------------------------------------------------
#  | PURPOSE: DEMONSTRATE STATISTICAL METHODS FOR QTL ANALYSIS IN R
#  |
#  |------------------------------------------------------------------
#  | REFERENCE:  R code for "A brief tour of R/qtl"
#  |  Karl W Broman, kbroman.wisc.edu http://www.rqtl.org
#  |-----------------------------------------------------------------

library(qtl) # call package qtl

ls()

############################################################
#  exploring backcross data
############################################################

data(fake.bc) # load simulated backcross data (default data in qtl package)
ls()

summary(fake.bc) # summary of info in fake.bc which is and object of class 'cross'

# you can also get this information with specific function calls:
nind(fake.bc)   # number of individuals
nphe(fake.bc)   # number of phenotypes
nchr(fake.bc)   # number of chromosomes
totmar(fake.bc) # number of total  markers
nmar(fake.bc)   # list markers?

############################################################
#   plotting maps, genotypes, phenotypes, and markers
###########################################################

plot(fake.bc) # gives plots data

# you can also call for the plots individually:
plot.missing(fake.bc) # just plot missing genotypes
plot.map(fake.bc)     # just plot the genetic map
plot.pheno(fake.bc, pheno.col=1) # just plot phenotype 1 'phe 1'
plot.map(fake.bc, chr=c(1, 4, 6, 7, 15), show.marker.names=TRUE) # specific chromosomes and marker names
plot.missing(fake.bc, reorder=TRUE) # order NA genotypes based on value of phenotype

fake.bc <- drop.nullmarkers(fake.bc)  # drop obs with missing genotypes
totmar(fake.bc) # total # markers left

##################################################################
#  specifying and estimating the likelihood used for QTL mapping
#################################################################

# From Broman:
# The function calc.genoprob calculates QTL genotype probabilities, conditional
# on the available marker data. These are needed for most of the QTL mapping
# functions. The argument step indicates the step size (in cM) at which the
# probabilities are calculated, and determines the step size at which later
# LOD scores are calculated.

fake.bc <- calc.genoprob(fake.bc, step=1, error.prob=0.01)

# function scanone performs single-QTL genome scan with a normal model.
#  methods include: maximum likelihood via the EM algorithm
#  and Haley-Knott regression

out.em <- scanone(fake.bc)
out.hk <- scanone(fake.bc, method="hk")

# multiple imputation method using sim.geno utilizing the joint
# genotype distribution, given the observed marker data.

fake.bc <- sim.geno(fake.bc, step=2, n.draws=16, error.prob=0.01)
out.imp <- scanone(fake.bc, method="imp")

# get the maximum LOD score on each chromosome
# can also specify a threshold for LOD

summary(out.em)
summary(out.em, threshold=3)
summary(out.hk, threshold=3)
summary(out.imp, threshold=3)

# function max.scanone returns just the highest peak from output of scanone.
max(out.em) # based on expectation maximization
max(out.hk) # based on Haley-Knott regression
max(out.imp) # based on multiple imputation

##################################################################
#  plot LOD scores by chromosome for QTL mapping
#################################################################

plot(out.em, chr=c(2,5)) # just based on em method
plot(out.em, out.hk, out.imp, chr=c(2,5)) # all methods
plot(out.em, chr=c(2)) # zoom in on specified chromosome
Created by Pretty R at inside-R.org

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...