Using Space Ranger at JHPCE

[This article was first published on rstats | LIBD rstats club, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By Nick Eagles

As part of recent LIBD work with spatial gene expression, I recently was recommended the tool Space Ranger, which provides software pipelines walking Visium spatial RNA-seq samples through the steps we ultimately need to explore gene expression coupled with spatial information. In this blog post, I’ll explain how to start using Space Ranger at JHPCE, focusing heavily on the set-up details relevant to this cluster in particular.

Image source

What is Space Ranger

In practice, there are a fairly large number of computational steps we’d need to perform to produce spatial information about gene expression for a multiple-sample experiment, given just microscope images and Visium RNA-seq output. To start, we’d want our data in FASTQ format- then we’d have to worry about aligning reads to a reference genome, producing gene counts, normalizing data, and so on. Thankfully, Space Ranger bundles together these steps into three simple utilities. We won’t focus too much on how to use these individual utilities or the various features of Space Ranger, documented in detail here; rather, this blog post will describe how to get Space Ranger up and running at the JHPCE cluster.

Using the spaceranger module at JHPCE

We make regular use of lmod environment modules at JHPCE, as a means of loading and running software without worrying about user set-up differences, manually modifying your PATH, or other nasty considerations. While some sets of modules are available system-wide (for any user), others are not accessible unless you specifically “use” them. To make LIBD-specific modules like spaceranger available, you must “use” the set of modules explicitly:

module use /jhpce/shared/jhpce/modulefiles/libd

If you want to avoid typing this every time you want to use an LIBD module, consider the .bashrc trick described here.

Next, let’s load the spaceranger module in particular.

module load spaceranger

Note: the above code loads the default version of the spaceranger module currently available. You can see which versions are available with:

module avail spaceranger

# Example output may look like this: 
##-------------------------- /jhpce/shared/jhpce/modulefiles/libd ---------------------------
##   spaceranger/1.1.0
##

# You may also load a specific version of the module:
module load spaceranger/1.1.0

First script

Next, let’s run a test of the Space Ranger software on example data they provide. We will write a bash script to load the spaceranger module as above, and call the executable. We could easily have qrsh’d into a compute node and run the few lines of code interactively, but I recommend writing a bash script, which we will qsub, for a few reasons:

  • A script documents the code you have run, allowing others to see and reproduce the work you’ve done.
  • When we qsub the script, we include arguments regarding memory and other hardware resources, which you otherwise would have to remember or estimate each time you interactively run this or similar code.
  • Using qsub allows long-running code to continue without having to worry about keeping your session running and network-connected. This example won’t take long to run, but Space Ranger on real experiments likely will.

Let’s start by writing the “skeleton” of our script, including only the basic required code before worrying about memory, logging, or other more complicated issues. Note that this will create a directory called “tiny” with the example outputs in the current working directory. I’m opening a new file I’ll call spaceranger_test.sh, and the contents should like something like this:

#  Make LIBD modules available, and load the "spaceranger" module
module use /jhpce/shared/jhpce/modulefiles/libd
module load spaceranger

#  Test Space Ranger on already-installed example data
spaceranger testrun --id=tiny

If you qsub this script as-is, it will produce two log files in your home directory, containing verbose and somewhat cryptic errors. We’d prefer a single clearly-named log file written to the same directory as our bash script, and of course to fix the source of the Space Ranger error. In this case, we simply need to provide more memory to fix the main error.

Below, we flesh out spaceranger_test.sh with arguments to qsub which will improve logging and provide sufficient memory. These arguments are indicated by lines beginning with #$.

#  Specify memory and other details below. In order:
#    "-cwd": write the log file to the current working directory
#    "-o" and "-e": combine 'STDOUT' and 'STDERR' messages into the same log file
#    "-l mem_free=20G,h_vmem=20G": request 20G of memory free, which may not be exceeded

#$ -cwd
#$ -o spaceranger_test.txt
#$ -e spaceranger_test.txt
#$ -l mem_free=20G,h_vmem=20G

#  Make LIBD modules available, and load the "spaceranger" module
module use /jhpce/shared/jhpce/modulefiles/libd
module load spaceranger

#  Test Space Ranger on already-installed example data
spaceranger testrun --id=tiny

Now, we can actually submit the script and wait for the job to complete.

qsub spaceranger_test.sh

If you open spaceranger_test.txt after the job completes, you should see that the test was successful. However, there is a worrying warning suggesting that Space Ranger is not properly made aware of the memory to which it should have access:

Martian Runtime - v4.0.0
2020-10-19 15:48:59 [jobmngr] WARNING: configured to use 453GB of local memory, but only 331.3GB is currently available.
2020-10-19 15:48:59 [jobmngr] WARNING: The current virtual address space size
                              limit is too low.
    Limiting virtual address space size interferes with the operation of many
    common libraries and programs, and is not recommended.
    Contact your system administrator to remove this limit.

Rather than using 20GB of memory, Space Ranger believes it has a whopping 453GB of memory to work with, though only ~331GB are actually free. In the next section we will communicate memory and even CPU constraints to Space Ranger with arguments to the spaceranger command.

Exploring memory and parallelization options

Below, we will construct another bash script to submit with qsub, demonstrating how to properly specify memory and number of CPUs for a hypothetical dataset. Suppose we have an experiment with multiple FASTQ files and a microscope slide image. We would like to call the spaceranger count command on this input data, making use of parallelization for speed. Let’s use 5 CPU cores and a total of 60GB of memory. Following the documentation here, we can create the template script we’ll call SR_count_example.sh, appropriate for running at JHPCE:

# Specify memory and other details. Note that 'mem_free' and 'h_vmem' specify
# per-core memory (12G * 5 cores = 60GB total, as we want), as indicated here:
# https://jhpce.jhu.edu/knowledge-base/how-to/#multicore

#$ -cwd
#$ -o SR_count_example.txt
#$ -e SR_count_example.txt
#$ -l mem_free=12G,h_vmem=12G
#$ -pe local 5

#  Make LIBD modules available, and load the "spaceranger" module
module use /jhpce/shared/jhpce/modulefiles/libd
module load spaceranger

#  The main Space Ranger command
spaceranger count \
    --id= \
    --fastqs  \
    --image  \
    --jobmode=local \ # we will use one "node" of the cluster, which has many cores available
    --localcores=5 \  # we requested 5 cores at the top
    --localmem=54     # 60GB * 0.9 = 54GB; using 90% of total memory requested is recommended

In practice, you’d specify an --id, the FASTQ paths --fastqs, and the microscope image --image in the above script, for your experiment. Then simply submit the script as a job!

qsub SR_count_example.sh

Note: you might also be interested in sgejobs that we explored in a LIBD rstats club session. You can use it to create SGE bash scripts.

References

[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.10. 2019. URL: https://CRAN.R-project.org/package=knitcitations.

[2] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. URL: https://CRAN.R-project.org/package=sessioninfo.

[3] Y. Xie, A. P. Hill, and A. Thomas. blogdown: Creating Websites with R Markdown. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: https://github.com/rstudio/blogdown.

Reproducibility

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.2 (2020-06-22)
##  os       macOS Catalina 10.15.7      
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2020-10-21                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version date       lib source                            
##  assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.0.0)                    
##  bibtex          0.4.2.3 2020-09-19 [1] CRAN (R 4.0.2)                    
##  BiocManager     1.30.10 2019-11-16 [1] CRAN (R 4.0.0)                    
##  BiocStyle     * 2.17.1  2020-09-24 [1] Bioconductor                      
##  blogdown        0.21.19 2020-10-21 [1] Github (rstudio/[email protected]) 
##  bookdown        0.21    2020-10-13 [1] CRAN (R 4.0.2)                    
##  cli             2.1.0   2020-10-12 [1] CRAN (R 4.0.2)                    
##  colorout      * 1.2-2   2020-05-18 [1] Github (jalvesaq/[email protected])
##  crayon          1.3.4   2017-09-16 [1] CRAN (R 4.0.0)                    
##  digest          0.6.26  2020-10-17 [1] CRAN (R 4.0.2)                    
##  evaluate        0.14    2019-05-28 [1] CRAN (R 4.0.0)                    
##  fansi           0.4.1   2020-01-08 [1] CRAN (R 4.0.0)                    
##  generics        0.0.2   2018-11-29 [1] CRAN (R 4.0.0)                    
##  glue            1.4.2   2020-08-27 [1] CRAN (R 4.0.2)                    
##  htmltools       0.5.0   2020-06-16 [1] CRAN (R 4.0.2)                    
##  httr            1.4.2   2020-07-20 [1] CRAN (R 4.0.2)                    
##  jsonlite        1.7.1   2020-09-07 [1] CRAN (R 4.0.2)                    
##  knitcitations * 1.0.10  2019-09-15 [1] CRAN (R 4.0.0)                    
##  knitr           1.30    2020-09-22 [1] CRAN (R 4.0.2)                    
##  lubridate       1.7.9   2020-06-08 [1] CRAN (R 4.0.2)                    
##  magrittr        1.5     2014-11-22 [1] CRAN (R 4.0.0)                    
##  plyr            1.8.6   2020-03-03 [1] CRAN (R 4.0.0)                    
##  R6              2.4.1   2019-11-12 [1] CRAN (R 4.0.0)                    
##  Rcpp            1.0.5   2020-07-06 [1] CRAN (R 4.0.2)                    
##  RefManageR      1.2.12  2019-04-03 [1] CRAN (R 4.0.0)                    
##  rlang           0.4.8   2020-10-08 [1] CRAN (R 4.0.2)                    
##  rmarkdown       2.5     2020-10-21 [1] CRAN (R 4.0.2)                    
##  sessioninfo   * 1.1.1   2018-11-05 [1] CRAN (R 4.0.0)                    
##  stringi         1.5.3   2020-09-09 [1] CRAN (R 4.0.2)                    
##  stringr         1.4.0   2019-02-10 [1] CRAN (R 4.0.0)                    
##  withr           2.3.0   2020-09-22 [1] CRAN (R 4.0.2)                    
##  xfun            0.18    2020-09-29 [1] CRAN (R 4.0.2)                    
##  xml2            1.3.2   2020-04-23 [1] CRAN (R 4.0.0)                    
##  yaml            2.2.1   2020-02-01 [1] CRAN (R 4.0.0)                    
## 
## [1] /Library/Frameworks/R.framework/Versions/4.0branch/Resources/library

To leave a comment for the author, please follow the link and comment on their blog: rstats | LIBD rstats club.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)