# QTL Analysis in R

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

See also: Part 1: QTL Analysis and Quantitative Genetics Part 2: Statistical Methods for QTL Analysis **Econometric Sense**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The ‘qtl’ package in R allows you to implement QTL analysis using the methods I’ve previously discussed. The code below is adapted from Broman’s documentation “A Brief Tour of R/qtl.” ( http://www.rqtl.org/tutorials/rqtltour.pdf ) My code (which follows) is an even briefer tour, relating specifically to the general topics in QTL analysis that I have previously discussed in Part 1 and 2 of this 3 part series of posts. The data set is a simulated backcross data set. The ‘summary’ function provides some basic descriptive data, such as the number of individuals in the crosses, the number of mapped chromosomes, the number of phenotypes, and number of mapped markers.

Ex:

Backcross

No. individuals: 400

No. phenotypes: 4

Percent phenotyped: 100 100 100 100

No. chromosomes: 19

Autosomes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Total markers: 91

No. markers: 7 5 3 4 10 5 4 4 6 4 4 5 11 3 3 4 4 2 3

Percent genotyped: 72.7

Genotypes (%): AA:50.1 AB:49.9

The plot function (see output below) provides 1) a genetic map with markers 2) phenotype distributions 3) missing data visualizations:

Recall, from the end of my discussion in Part 1 and my discussion in Part 2 I stated:

For maximum likelihood estimation, the following probability density of y

_{i}can be stated as:f(y

_{i}) = Pr(y_{i}|Z_{i}=H_{k}) ‘ the probability of phenotype ‘y’ given genotype H_{k}’= (1/ √ 2π σ) exp[ 1/2 σ

^{2}(y_{i}-X_{i}β +H_{k}γ )^{2}The hypothesis H

_{0}:_{ }γ =0 can be tested using the likelihood ratio test:λ = -2(L

_{0}-L_{1})where L

_{0}represents the likelihood under a restricted model. This is equivalent to the general notion presented in Broman (1997) and my previous post:LOD = Likelihood (effect occurs by QTL linkage) / Likelihood(effect occurs by chance)

The calc.genoprob, scanone, and sim.geno functions allow you to make these calculations with options for expectation maximization, Haley-Knott regression, or multiple imputation. ‘summary’ and ‘max’ functions allow you to identify the locations on the chromosomes where the LOD score is maximized and infer the potential location of a QTL. The plot function allows you to plot this data for specified chromosomes.

R Code: (note this code scrolls from left to right- click on the code and use the arrow keys or use the scroll bar at the bottom of the post )

# ------------------------------------------------------------------ # |PROGRAM NAME: EX_QTL_R # |DATE: 8-13-11 # |CREATED BY: MATT BOGARD # |PROJECT FILE: # |---------------------------------------------------------------- # | PURPOSE: DEMONSTRATE STATISTICAL METHODS FOR QTL ANALYSIS IN R # | # |------------------------------------------------------------------ # | REFERENCE: R code for "A brief tour of R/qtl" # | Karl W Broman, kbroman.wisc.edu http://www.rqtl.org # |----------------------------------------------------------------- library(qtl) # call package qtl ls() ############################################################ # exploring backcross data ############################################################ data(fake.bc) # load simulated backcross data (default data in qtl package) ls() summary(fake.bc) # summary of info in fake.bc which is and object of class 'cross' # you can also get this information with specific function calls: nind(fake.bc) # number of individuals nphe(fake.bc) # number of phenotypes nchr(fake.bc) # number of chromosomes totmar(fake.bc) # number of total markers nmar(fake.bc) # list markers? ############################################################ # plotting maps, genotypes, phenotypes, and markers ########################################################### plot(fake.bc) # gives plots data # you can also call for the plots individually: plot.missing(fake.bc) # just plot missing genotypes plot.map(fake.bc) # just plot the genetic map plot.pheno(fake.bc, pheno.col=1) # just plot phenotype 1 'phe 1' plot.map(fake.bc, chr=c(1, 4, 6, 7, 15), show.marker.names=TRUE) # specific chromosomes and marker names plot.missing(fake.bc, reorder=TRUE) # order NA genotypes based on value of phenotype fake.bc <- drop.nullmarkers(fake.bc) # drop obs with missing genotypes totmar(fake.bc) # total # markers left ################################################################## # specifying and estimating the likelihood used for QTL mapping ################################################################# # From Broman: # The function calc.genoprob calculates QTL genotype probabilities, conditional # on the available marker data. These are needed for most of the QTL mapping # functions. The argument step indicates the step size (in cM) at which the # probabilities are calculated, and determines the step size at which later # LOD scores are calculated. fake.bc <- calc.genoprob(fake.bc, step=1, error.prob=0.01) # function scanone performs single-QTL genome scan with a normal model. # methods include: maximum likelihood via the EM algorithm # and Haley-Knott regression out.em <- scanone(fake.bc) out.hk <- scanone(fake.bc, method="hk") # multiple imputation method using sim.geno utilizing the joint # genotype distribution, given the observed marker data. fake.bc <- sim.geno(fake.bc, step=2, n.draws=16, error.prob=0.01) out.imp <- scanone(fake.bc, method="imp") # get the maximum LOD score on each chromosome # can also specify a threshold for LOD summary(out.em) summary(out.em, threshold=3) summary(out.hk, threshold=3) summary(out.imp, threshold=3) # function max.scanone returns just the highest peak from output of scanone. max(out.em) # based on expectation maximization max(out.hk) # based on Haley-Knott regression max(out.imp) # based on multiple imputation ################################################################## # plot LOD scores by chromosome for QTL mapping ################################################################# plot(out.em, chr=c(2,5)) # just based on em method plot(out.em, out.hk, out.imp, chr=c(2,5)) # all methods plot(out.em, chr=c(2)) # zoom in on specified chromosome

To

**leave a comment**for the author, please follow the link and comment on their blog:**Econometric Sense**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.