Bioconductor looking for Google Summer of Code Applicants

April 17, 2013
By
Aannounced: the 177 mentoring organizations accepted for 2013′s Google Summer of Code program. We’re proud that Bioconductor is one the organizations chosen.

Google Summer of Code is a global program that offers students stipends (USD$5000) to write code for open source projects.

We’ve proposed three ideas. Students may also propose their own ideas.

Project 1: ExperimentHub

AnnotationHub and its supporting packages are primed to support such a project. AnnotationHub provides infrastructure to make well-curated resources available to R software clients, but it needs the addition of a web interface to allow addition of user-supplied resources, including transformation of data into formats amenable to direct use by R clients.

Project 2: Shiny Bioconductor Objects

The Shiny package allows for easy creation of interactive web graphics from R objects. Bioconductor packages have many objects that represent biological data or results. For each of these Bioconductor objects, there exists a typical set of visualizations to help users explore their data. Normally, these visuals are replotted several times until certain parameters are tweaked to show the image in a way that conveys a specific insight. This project pairs these standard Bioconductor objects with more user-friendly Shiny visualizations via new display() methods.

Project 3: BiocParallel / BatchJobs integration

High-throughput sequencing generates data sets consisting of hundreds of millions of sequence reads per sample. As with any large data, timely processing depends on parallel computing. The Bioconductor project has developed the BiocParallel package, an abstraction around several parallel implementations in R. The API is tailored to typical use cases in biological data analysis
and integrates with existing Bioconductor data structures. Another package, BatchJobs, executes R functions as scheduled cluster jobs, through an abstraction that has been implemented for several popular schedulers, including LSF, PBS and SGE.

As sequencing data pipelines are typically executed on managed clusters, there is a need for BiocParallel to interact with cluster
schedulers. We aim to add a new backend to BiocParallel that delegates to BatchJobs for this interaction.

Contact us
Students who are interested in participating should contact us right away at the email addresses associated with each
project.


If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.