Simple plots reveal interesting artifacts

March 14, 2012
By

(This article was first published on What You're Doing Is Rather Desperate » R, and kindly contributed to R-bloggers)

I’ve recently been working with methylation data; specifically, from the Illumina Infinium HumanMethylation450 bead chip. It’s a rather complex array which uses two types of probes to determine the methylation state of DNA at ~ 485 000 sites in the genome.

The Bioconductor project has risen to the challenge with a (somewhat bewildering) variety of packages to analyse data from this chip. I’ve used lumi, methylumi and minfi, but will focus here on just the latter.

Update: a related post from Oliver Hofmann (@fiamh).

The data came to me in a standard directory and file structure (created, I think, using BeadStudio), as raw IDAT files with a sample sheet. So, reading the data into R was straightforward:

library(minfi)
library(IlluminaHumanMethylation450kmanifest)
baseDir <- "path/to/datadir"
targets <- read.450k.sheet(baseDir)
RGset   <- read.450k.exp(base = baseDir, targets = targets)

This creates an RGChannelSet:

RGChannelSet (storageMode: lockedEnvironment)
assayData: 622399 features, 108 samples 
  element names: Green, Red 
protocolData: none
phenoData
  sampleNames: 6057841162_R01C01 6057841162_R02C01 ...
    6264488081_R06C02 (108 total)
  varLabels: Sample_Name Sample_Well ... filenames (9 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: IlluminaHumanMethylation450k

The minfi package provides a function to convert the red/green channel values to raw intensity measurements representing methylated or unmethylated:

raw <- preprocessRaw(RGset)

That results in a MethylSet:

MethylSet (storageMode: lockedEnvironment)
assayData: 485577 features, 108 samples 
  element names: Meth, Unmeth 
protocolData: none
phenoData
  sampleNames: 6057841162_R01C01 6057841162_R02C01 ...
    6264488081_R06C02 (108 total)
  varLabels: Sample_Name Sample_Well ... filenames (9 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: IlluminaHumanMethylation450k 
Preprocessing Raw (no normalization or bg correction) minfi version 1.0.0 
Manifest version 0.2.0

The functions getMeth() and getUnmeth() extract the methylated/unmethylated intensities, respectively, to a matrix:

meth <- getMeth(raw)
dim(meth)
## [1] 485577    108

The MethylSet object, raw, has the same structure as the more familiar eSet, so we can extract phenotype data using pData():

names(pData(raw))
## [1] "Sample_Name"  "Sample_Well"  "Sample_Plate" "Sample_Group" "Pool_ID"     
## [6] "Array"        "Slide"        "Basename"     "filenames"

It turns out that for my data, the 108 samples were run in 9 groups of 12 and are ordered as such in the sample sheet: that is samples 1-12 = Group1, 13-24 = Group2 and so on, up to 97-108 = Group9. It also turns out that the variable pData(raw)$Array is ordered in groups of 6. So for the first 18 samples:

head(pData(raw)$Array, 18)
##  [1] "R01C01" "R02C01" "R03C01" "R04C01" "R05C01" "R06C01" "R01C02" "R02C02"
##  [9] "R03C02" "R04C02" "R05C02" "R06C02" "R01C01" "R02C01" "R03C01" "R04C01"
## [17] "R05C01" "R06C01"

Keep those groups and orderings in mind, as we generate a boxplot of raw methylated intensities for the 108 samples:

library(RColorBrewer)
cols <- brewer.pal(9, "Set3")
groups.uniq <- unique(pData(raw)$Sample_Group)
m <- match(pData(raw)$Sample_Group, groups.uniq)
boxplot(meth, col = cols[m])

Result on the right.

meth

Boxplot of raw intensities using minfi getMeth() method, Illumina 450K, 108 samples


Interesting and evidently, something “non-random” is going on. It looks as though we have 2 artifacts to consider. First, for these samples, groups 6-9 differ from groups 1-5. More interestingly, within groups, we see a cyclic signal where intensity rises gradually across the first 6 arrays, then falls and rises again for the second set of 6 arrays.

I love that when I tweet about this technical, esoteric stuff, I get instant and useful responses. Thanks for following: @iameltonjohn, @tweetingtechno and @fiamh.

Take-home message: very simple plots can be informative. Now to figure out what it all means.


Filed under: bioinformatics, R, research diary, statistics Tagged: illumina 450k, methylation, microarray

To leave a comment for the author, please follow the link and comment on his blog: What You're Doing Is Rather Desperate » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.