methylKit for bisulfite sequencing data analysis

[This article was first published on CHITKA, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have been relying upon methylKit, an R package for my RRBS data analysis. It is one the most highly cited R package for analysing bisulifite sequencing data. It is straight forward to install and it’s vignette details all the major steps in the bisulfite analysis with clarity. Altuna Akalin, the author of the methylKit has been actively supporting (via google groups) the issues faced by the users in implementing methylKit. Overall, methylKit could also be used with little knowledge of R. Interestingly, working with methylKit also helps laboratory researchers learn R.

As with other bisulfite sequencing data analysis packages, methylkit takes charge once the bisulfite reads are aligned to the genome. Here are the tasks one can implement using methylKit:

  • Extract methylation information from aligned data from mappers like Bismark
  • Alternatively, one can read the methylation information of mapped cytosines easily from other mappers like BSMAP  or any other bisulfite mapper in a specified format 
  • Normalize the CpGs covered by removing the CpGs that have excess coverage due to over amplification/PCR duplication
  • Calculate methylation status of each CpG covered (or specified regions or tiles across genome) and export them into bed or bedcoverage formats for visualization in a genome browser. methylKit also enables merging of strand coverage.
  • Enables the consideration of replicates among control and test samples
  • Calculate differential methylation either at single CpG levels or regions/tiles covered across the control and test samples
  • Facilitates PCA and cluster analysis to identify the overall relation among the samples from methylation point of view
  • Enables annotation of differential methylation across CpG islands/shores and multiple genic regions of interest such as promoters, exons, introns……
Any genomic analysis is highly customized after a certain number of basic steps. One has to build the customization by utilizing the options among several packages and bridging the gaps by fine tuning the input and output file formats. methylKit does a fairly good job by facilitating the coercion of methylKit objects into GenomicRanges objects such as GRanges. This feature enables seamless integration into multiple packages from bioconductor. In the future posts, I will detail some examples and R scripts that facilitate extension to methylKit analysis.

To leave a comment for the author, please follow the link and comment on their blog: CHITKA. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)