methylKit v0.9.6

Posted on June 2, 2016 by altuna in R bloggers | 0 Comments

[This article was first published on Recipes, scripts and genomics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We released a new version of methylKit, which is a package for DNA methylation analysis with bisulfite-seq data. This version comes with many changes summarized below. you can also have a look at the release notes. The vignette is now converted to HTML and also lives on rpubs.

Working with large and/or many files

This turns out to be more and more of a problem. When we first designed methylKit it was tested on RRBS data with couple of million CpGs covered. These days many people have either many RRBS or similar reduced representation sequencing results or a fair amount whole-genome bisulfite sequencing. Due to their size, it may be difficult to work with these kind of samples in-memory. We developed a flatfile database based on “tabix”(http://www.htslib.org/doc/tabix.html). This means large objects are written out as tabix files, they are compressed and indexed. We can work with them chunk-by-chunk in a parallelized manner. However, that comes with a cost. It will take time to initially create tabix files but you will be able to work with large data sets without so much of the memory bottleneck.

C/C++ based parsing of Bismark SAM/BAM files

We used to parse sorted Bismark SAM files with a perl script. Now, that script is converted to C/C++ using samtools libraries and Rcpp in the background. This made parsing faster also got rid of a system dependency. processBismarkAln() function is the one where this functionality is implemented. It can parse sorted BAM/SAM files.

More differential methylation tests and dealing with covariates

We added more differential methylation tests. Main improvement is that we can adjust for overdispersion within the logistic regression model. In addition, we can also add covariates such as age and sex into the model and take them into account when testing for differential methylation. We also added a beta-binomial distribution based test by wrapping the functions in the bioconductor DSS package.

Methylation profile segmentation

Regulatory regions can be discovered by analyzing the methylation patterns, segments with low methylation could indicate a regulatory region. We used a segmentation strategy based on change-point analysis, where change points in a signal across the genome is extracted and the genome is segmented to regions between two change points These methods are typically used in CNV (copy-number variation) detection but could have applications in methylation context as well. In the context of methylation, we first extract segments between change points of the signal (methylation profiles) then cluster those segments into groups based on the mean methylation values of the segments. In this case, a segment is a region of the genome where CpGs have similar methylation values. We cluster those segments into segment groups using mixture modeling. This functionality is also mentioned here in this blog before, now it is a part of the current version.

Reading processed Bismark files

Bismark aligner scripts can produce per base methylation files. These files, namely “cytosine report” and “coverage” files can be read in R using methylKit methRead() function. See Bismark manual for details on the files.

Function name changes

The following function names are changed for package wide naming consistency. The old names will continue to work for a while.
– read() to methRead()
– read.bismark to processBismarkAln()
– adjust.methylC() to adjustMethylC()
– get.methylDiff() to getMethylDiff()
– annotate.WithFeature() to annotateWithFeature()
– annotate.WithFeature.Flank() to annotateWithFeatureFlank()
– annotate.WithGenicParts() to annotateWithGenicParts()
– read.bed() to readBed()
– read.feature.flank() to readFeatureFlank()
– read.transcript.features() to readTranscriptFeatures()

To leave a comment for the author, please follow the link and comment on their blog: Recipes, scripts and genomics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

methylKit v0.9.6

Working with large and/or many files

C/C++ based parsing of Bismark SAM/BAM files

More differential methylation tests and dealing with covariates

Methylation profile segmentation

Reading processed Bismark files

Function name changes

Related

Working with large and/or many files

C/C++ based parsing of Bismark SAM/BAM files

More differential methylation tests and dealing with covariates

Methylation profile segmentation

Reading processed Bismark files

Function name changes

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)