My newyeaRs resolution: slimming down (Seurat)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m currently developing a single-cell RNA Sequencing pipeline for NetworkAnalyst; an open source, web-based platform for comprehensive gene expression profiling & meta-analysis. The website uses mostly R code (via {RServe}???? ), Java for the heavy-lifting and Javascript for data visualization. Developing this pipeline I’ve been applying various profiling and timing tools to optimize R code and would like to share my experiences.

We decided to use the {Seurat} ???? from the Satija Lab because it is one of the most comprehensive packages for end-to-end scRNA-Seq analysis (it includes tools for QC, analysis, visualization, clustering, DE analysis, analysis of spatial data etc.).
When you’re hosting multiple concurrent users on a web-based platform memory is of the utmost concern. Using the {pryr}???? we can see the total memory used when we start R:

When a user starts a session, any packages loaded will persist in memory — even if you try to unload them!

What if we try to detach all of them the same way? ????(this happens even with the force = TRUE paramater set). What about unloadNamespace()? It won’t work the memory footprint will persist.
The way we get around this with large ????’s that are only used at one step of the pipeline (let’s use {DESeq2}???? which is used for DE analysis as an example) is to use {RSclient}????, which creates a microservice. Essentially, it allows another R session to start new RServe sessions pass data/functions between the sessions and close the session when the task is done.

However, Seurat is used throughout the pipeline so a different strategy is required. It would be beneficial to reduce the size of Seurat package as many concurrent users on the server would create a large memory footprint.
Seurat has many functions and analysis methods, you can think of it like a Swiss army knife — a really really big one.

Now many of these features won’t be of interest to our users (e.g. Multiple Dataset Integration and Label Transfer) so we can get rid of them (and their dependencies) to slim down the package. We want to end up with a smaller, yet, multi-purpose, binary of Seurat.

If users need/want more functionality they can always install Seurat and work through things locally themselves. After all, one of the benefits to web-based servers like NetworkAnalyst and Galaxy is the low-bar of entry to wetlab biologists, undergraduate students, clinicians etc.

After removing more than 75 functions (over 21k lines of code) and reducing the dependencies from 46 -> 24. I reduced the memory of the footprint by 1/3 (~145Mb -> ~95Mb). [link to Github]



If you find this article useful feel free to share it with others or recommend this article! ????
As always, if you have any questions or comments feel free to leave your feedback below or you can always reach me on LinkedIn. Till then, see you in the next post! ????
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
