I have been working on methylation analysis using RRBS method. This method is based on the restriction digestion pattern of the enzyme MspI on the whole genome. I wanted to perform in silico digestion of MspI on mouse genome(mm10) to virtually see the pattern of digestion. After scanning the web finally I narrowed down on a bioconductor package “Biostrings” that helped me achieve this task. Here, I give the code to perform this task. Since the package I used is an R package, it also helped me perform a variety of downstream analysis pretty fast.
This method is based on the ability of the “Biostrings” package to recognize the MspI restriction site (CCGG) on the mouse genome (BSgenome.Mmusculus.UCSC.mm10 bioconductor package loaded into R). Following tasks are peformed by the script below:
- Load the needed bioconductor and R packages
- Identify the MspI restriction sites (genomic co-ordinates) per chromosome in the genome.
- Extract the start and end co-ordinates of the dna fragments resulting from the genomic digestion (using gaps)
- Create a dataframe of the genomic co-ordinates of the digested fragments fro each chromosome for easier downstream analysis
- Plot the frequency of the length of digested fragments using ggplot2
|Frequency of the MspI digested fragments plotted with ggplot2|