By Nick Eagles
As part of recent LIBD work with spatial gene expression, I recently was recommended the tool Space Ranger, which provides software pipelines walking Visium spatial RNA-seq samples through the steps we ultimately need to explore gene expression coupled with spatial information. In this blog post, I’ll explain how to start using Space Ranger at JHPCE, focusing heavily on the set-up details relevant to this cluster in particular.
What is Space Ranger
In practice, there are a fairly large number of computational steps we’d need to perform to produce spatial information about gene expression for a multiple-sample experiment, given just microscope images and Visium RNA-seq output. To start, we’d want our data in FASTQ format- then we’d have to worry about aligning reads to a reference genome, producing gene counts, normalizing data, and so on. Thankfully, Space Ranger bundles together these steps into three simple utilities. We won’t focus too much on how to use these individual utilities or the various features of Space Ranger, documented in detail here; rather, this blog post will describe how to get Space Ranger up and running at the JHPCE cluster.
spaceranger module at JHPCE
We make regular use of lmod environment modules at JHPCE, as a means of loading and running software without worrying about user set-up differences, manually modifying your PATH, or other nasty considerations. While some sets of modules are available system-wide (for any user), others are not accessible unless you specifically “use” them. To make LIBD-specific modules like
spaceranger available, you must “use” the set of modules explicitly:
module use /jhpce/shared/jhpce/modulefiles/libd
If you want to avoid typing this every time you want to use an LIBD module, consider the
.bashrc trick described here.
Next, let’s load the
spaceranger module in particular.
module load spaceranger
Note: the above code loads the default version of the
spaceranger module currently available. You can see which versions are available with:
module avail spaceranger # Example output may look like this: ##-------------------------- /jhpce/shared/jhpce/modulefiles/libd --------------------------- ## spaceranger/1.1.0 ## # You may also load a specific version of the module: module load spaceranger/1.1.0
Next, let’s run a test of the Space Ranger software on example data they provide. We will write a bash script to load the
spaceranger module as above, and call the executable. We could easily have
qrsh’d into a compute node and run the few lines of code interactively, but I recommend writing a bash script, which we will
qsub, for a few reasons:
- A script documents the code you have run, allowing others to see and reproduce the work you’ve done.
- When we
qsubthe script, we include arguments regarding memory and other hardware resources, which you otherwise would have to remember or estimate each time you interactively run this or similar code.
qsuballows long-running code to continue without having to worry about keeping your session running and network-connected. This example won’t take long to run, but Space Ranger on real experiments likely will.
Let’s start by writing the “skeleton” of our script, including only the basic required code before worrying about memory, logging, or other more complicated issues. Note that this will create a directory called “tiny” with the example outputs in the current working directory. I’m opening a new file I’ll call
spaceranger_test.sh, and the contents should like something like this:
# Make LIBD modules available, and load the "spaceranger" module module use /jhpce/shared/jhpce/modulefiles/libd module load spaceranger # Test Space Ranger on already-installed example data spaceranger testrun --id=tiny
qsub this script as-is, it will produce two log files in your home directory, containing verbose and somewhat cryptic errors. We’d prefer a single clearly-named log file written to the same directory as our bash script, and of course to fix the source of the Space Ranger error. In this case, we simply need to provide more memory to fix the main error.
Below, we flesh out
spaceranger_test.sh with arguments to
qsub which will improve logging and provide sufficient memory. These arguments are indicated by lines beginning with
# Specify memory and other details below. In order: # "-cwd": write the log file to the current working directory # "-o" and "-e": combine 'STDOUT' and 'STDERR' messages into the same log file # "-l mem_free=20G,h_vmem=20G": request 20G of memory free, which may not be exceeded #$ -cwd #$ -o spaceranger_test.txt #$ -e spaceranger_test.txt #$ -l mem_free=20G,h_vmem=20G # Make LIBD modules available, and load the "spaceranger" module module use /jhpce/shared/jhpce/modulefiles/libd module load spaceranger # Test Space Ranger on already-installed example data spaceranger testrun --id=tiny
Now, we can actually submit the script and wait for the job to complete.
If you open
spaceranger_test.txt after the job completes, you should see that the test was successful. However, there is a worrying warning suggesting that Space Ranger is not properly made aware of the memory to which it should have access:
Martian Runtime - v4.0.0 2020-10-19 15:48:59 [jobmngr] WARNING: configured to use 453GB of local memory, but only 331.3GB is currently available. 2020-10-19 15:48:59 [jobmngr] WARNING: The current virtual address space size limit is too low. Limiting virtual address space size interferes with the operation of many common libraries and programs, and is not recommended. Contact your system administrator to remove this limit.
Rather than using 20GB of memory, Space Ranger believes it has a whopping 453GB of memory to work with, though only ~331GB are actually free. In the next section we will communicate memory and even CPU constraints to Space Ranger with arguments to the
Exploring memory and parallelization options
Below, we will construct another bash script to submit with
qsub, demonstrating how to properly specify memory and number of CPUs for a hypothetical dataset. Suppose we have an experiment with multiple FASTQ files and a microscope slide image. We would like to call the
spaceranger count command on this input data, making use of parallelization for speed. Let’s use 5 CPU cores and a total of 60GB of memory. Following the documentation here, we can create the template script we’ll call
SR_count_example.sh, appropriate for running at JHPCE:
# Specify memory and other details. Note that 'mem_free' and 'h_vmem' specify # per-core memory (12G * 5 cores = 60GB total, as we want), as indicated here: # https://jhpce.jhu.edu/knowledge-base/how-to/#multicore #$ -cwd #$ -o SR_count_example.txt #$ -e SR_count_example.txt #$ -l mem_free=12G,h_vmem=12G #$ -pe local 5 # Make LIBD modules available, and load the "spaceranger" module module use /jhpce/shared/jhpce/modulefiles/libd module load spaceranger # The main Space Ranger command spaceranger count \ --id=<SOME RUN ID HERE> \ --fastqs <LIST OF FASTQ PATHS HERE> \ --image <IMAGE PATH HERE> \ --jobmode=local \ # we will use one "node" of the cluster, which has many cores available --localcores=5 \ # we requested 5 cores at the top --localmem=54 # 60GB * 0.9 = 54GB; using 90% of total memory requested is recommended
In practice, you’d specify an
--id, the FASTQ paths
--fastqs, and the microscope image
--image in the above script, for your experiment. Then simply submit the script as a job!
Note: you might also be interested in sgejobs that we explored in a LIBD rstats club session. You can use it to create SGE
This blog post was made possible thanks to:
## ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os macOS Catalina 10.15.7 ## system x86_64, darwin17.0 ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz America/New_York ## date 2020-10-21 ## ## ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21  CRAN (R 4.0.0) ## bibtex 0.4.2.3 2020-09-19  CRAN (R 4.0.2) ## BiocManager 1.30.10 2019-11-16  CRAN (R 4.0.0) ## BiocStyle * 2.17.1 2020-09-24  Bioconductor ## blogdown 0.21.19 2020-10-21  Github (rstudio/[email protected]) ## bookdown 0.21 2020-10-13  CRAN (R 4.0.2) ## cli 2.1.0 2020-10-12  CRAN (R 4.0.2) ## colorout * 1.2-2 2020-05-18  Github (jalvesaq/[email protected]) ## crayon 1.3.4 2017-09-16  CRAN (R 4.0.0) ## digest 0.6.26 2020-10-17  CRAN (R 4.0.2) ## evaluate 0.14 2019-05-28  CRAN (R 4.0.0) ## fansi 0.4.1 2020-01-08  CRAN (R 4.0.0) ## generics 0.0.2 2018-11-29  CRAN (R 4.0.0) ## glue 1.4.2 2020-08-27  CRAN (R 4.0.2) ## htmltools 0.5.0 2020-06-16  CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20  CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07  CRAN (R 4.0.2) ## knitcitations * 1.0.10 2019-09-15  CRAN (R 4.0.0) ## knitr 1.30 2020-09-22  CRAN (R 4.0.2) ## lubridate 1.7.9 2020-06-08  CRAN (R 4.0.2) ## magrittr 1.5 2014-11-22  CRAN (R 4.0.0) ## plyr 1.8.6 2020-03-03  CRAN (R 4.0.0) ## R6 2.4.1 2019-11-12  CRAN (R 4.0.0) ## Rcpp 1.0.5 2020-07-06  CRAN (R 4.0.2) ## RefManageR 1.2.12 2019-04-03  CRAN (R 4.0.0) ## rlang 0.4.8 2020-10-08  CRAN (R 4.0.2) ## rmarkdown 2.5 2020-10-21  CRAN (R 4.0.2) ## sessioninfo * 1.1.1 2018-11-05  CRAN (R 4.0.0) ## stringi 1.5.3 2020-09-09  CRAN (R 4.0.2) ## stringr 1.4.0 2019-02-10  CRAN (R 4.0.0) ## withr 2.3.0 2020-09-22  CRAN (R 4.0.2) ## xfun 0.18 2020-09-29  CRAN (R 4.0.2) ## xml2 1.3.2 2020-04-23  CRAN (R 4.0.0) ## yaml 2.2.1 2020-02-01  CRAN (R 4.0.0) ## ##  /Library/Frameworks/R.framework/Versions/4.0branch/Resources/library