Google Summer of Code is a global program that offers students stipends (USD$5000) to write code for open source projects.
Project 1: ExperimentHub
AnnotationHub and its supporting packages are primed to support such a project. AnnotationHub provides infrastructure to make well-curated resources available to R software clients, but it needs the addition of a web interface to allow addition of user-supplied resources, including transformation of data into formats amenable to direct use by R clients.
Project 2: Shiny Bioconductor Objects
The Shiny package allows for easy creation of interactive web graphics from R objects. Bioconductor packages have many objects that represent biological data or results. For each of these Bioconductor objects, there exists a typical set of visualizations to help users explore their data. Normally, these visuals are replotted several times until certain parameters are tweaked to show the image in a way that conveys a specific insight. This project pairs these standard Bioconductor objects with more user-friendly Shiny visualizations via new display() methods.
Project 3: BiocParallel / BatchJobs integration
High-throughput sequencing generates data sets consisting of hundreds of millions of sequence reads per sample. As with any large data, timely processing depends on parallel computing. The Bioconductor project has developed the BiocParallel package, an abstraction around several parallel implementations in R. The API is tailored to typical use cases in biological data analysis
and integrates with existing Bioconductor data structures. Another package, BatchJobs, executes R functions as scheduled cluster jobs, through an abstraction that has been implemented for several popular schedulers, including LSF, PBS and SGE.
As sequencing data pipelines are typically executed on managed clusters, there is a need for BiocParallel to interact with cluster
schedulers. We aim to add a new backend to BiocParallel that delegates to BatchJobs for this interaction.