Skip to content

Sherlock Software Setup Guide

bnprks edited this page Oct 5, 2021 · 9 revisions

Sherlock Software Setup Guide

When using Sherlock, you will probably have to work with several different package management systems to install the software you need. If you are coding with Python or R, you will need to download libraries, and you may also want to use other software which is pre-installed on Sherlock but not accessible unless you specifically import it. This guide will cover how to set up your workspace to have access to the common software you might need day-to-day in the lab.

The three package management tools that are covered are:

  1. module load -- the tool for accessing pre-installed software on Sherlock.
  2. conda -- a tool for managing python libraries, and downloading other software tools or system libraries that Sherlock does not provide.
  3. R -- R library management is easy compared to python, and the language built-ins will get you a long way.

Prerequisite: SSH setup

Typing in your password every time you want to start a new ssh session is really a pain, and life is a lot easier if you can start up new ssh windows as fast as possible. Add the following to the file ~/.ssh/config on your laptop/desktop computer:

Host sherlock 
    User [YOUR USERNAME]
    HostName login.sherlock.stanford.edu
    ControlMaster auto
    ControlPersist yes
    ControlPath ~/.ssh/%l%r@%h:%p

The first three lines mean you can type ssh sherlock and it will act the same as if you typed ssh [YOUR USERNAME]@login.sherlock.stanford.edu. The last three lines with the Control settings will save your running session to a file in your ~/.ssh folder, and next time you type ssh sherlock it will reuse that existing connection and skip asking for a password. Once this saved connection expires you will have to retype your password -- to avoid dropping this connection I set my computer to turn off the display but not sleep automatically when it's plugged in.

tl;dr now you only have to type in your password for ssh sherlock the first time, and all the later times will connect you automatically.

Bonus: you can also use this alias in place of user@login.sherlock.stanford.edu when you are using other ssh-based commands such as scp, sshfs, and rsync

The ~/.bashrc file

This guide will sometimes ask you to add a line to your ~/.bashrc file. Your bashrc is a short bash script that is run every time you log in to a node on Sherlock. It usually holds configuration commands that load libraries or programs you want fast access to. You can edit your bashrc by running nano ~/.bashrc on sherlock.

Module Load

The module load command is also abbreviated ml, and it lets you load any tools or system libraries that the Sherlock administrators (a.k.a Killian) have pre-installed. The selection is a little random, but has many useful packages including some paid software that Stanford licenses. You can see a list of the available software packages here: https://www.sherlock.stanford.edu/docs/software/list/

For now, we will just load a few command-line tools for handling genomic data.

Add the following lines to your ~/.bashrc

# Load genomic tools
ml biology bowtie2 bwa samtools bedtools bcl2fastq

Handy ml commands:

  • Load packages: ml [package1] [package2] ...
  • Unload packages: ml unload [package1] [package2] ...
  • List loaded packages: ml list
  • Search for packages: ml spider [search_term]
  • Full command help: ml help

Conda and Python libraries

Conda is a general-purpose package manager, but you see it used most frequently for managing python libraries. Most scientific python packages are available through conda, and a lot of non-python software and system libraries are easy to install through conda as well. We will be installing the miniconda distribution, which just gives us the conda command without a ton of pre-installed packages.

Install conda on your Sherlock account

  1. SSH into Sherlock
  2. Run wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    (this is the link listed on the main miniconda install page)
  3. When the installer prompts for an install location, set it to $OAK/$USER/software/miniconda3 or any other location of your choosing (I recommend Oak to avoid running out of 15GB of home directory storage)
  4. Agree to have the installer modify your .bashrc, this will make it so conda is activated by default when you log in. (This is generally good, but I have found that some R packages won't install properly unless I run conda deactivate first)
  5. (Optional) Run conda env update -f $PI_HOME/resources/software/base_conda_environment_v2.yml. This will get you up and running quickly by installing some common python packages and bioinformatics tools to your base environment. It might take a long time though, so you can also just install the packages you care about manually

Handy conda commands:

  • Search for a package: conda search [package-name]
  • Install a package: conda install [package-name]
  • Uninstall a package: conda uninstall [package-name]
  • Activate a conda environment: conda activate [name]
    (defaults to the base environment, and will show the name of your current environment at the start of the command prompt)
  • Deactivate a conda environment: conda deactivate
  • List your environments: conda env list
  • Get help on the conda command: conda help or go to the online user guide

Installing packages through pip: While you're using conda it's generally safest to install packages through the conda command, but if a package is not available you can install it through pip and usually not run into issues. Most packages are available through conda though (check the conda-forge or bioconda channels for a wider selection of packages than default conda).

A note on Python 2: Most python code runs just fine in Python 3, so in general don't worry about the 2 vs. 3 debate, just use Python 3 and don't look back. Occasionally you might find an old package that only supports Python 2, though. If you need access to install Python 2 packages, create a python 2 environment using conda create --name py2 python=2.7, which you can access using conda activate py2.

Why not use virtualenv: You may have heard of people using virtualenv to manage their environments. virtualenv is a handy tool, but it has a couple drawbacks compared to conda. First, it can only handle python packages, and can't help with managing other software and system libraries. Second, it can be cumbersome to manage multiple environments because you need to remember where you saved them. By contrast, conda lets you activate any environment with one simple command, and you can use the environments for much more than just python packages.

R Libraries

Managing R libraries on Sherlock is pretty simple. The only problem I have run in to is that sometimes you will need to run conda deactivate before installing R packages, since otherwise they can get confused when they try to compile using half libraries from conda and half from Sherlock.

First, create the directory you want to use to hold your R libraries: (I recommend using Oak to avoid running out of your 15GB home folder storage)

mkdir -p $OAK/$USER/software/R

All we need to do is load a few modules using ml, and set a variable to point to where you want to install your R libraries (I suggest ~/software/R)

Add the following lines to your ~/.bashrc

# Load libraries that some R libraries depend on
ml hdf5 gsl mariadb 
# Load rstudio, then reload R so we get version 4.0
ml rstudio R/4.0.2
# Set R library location
export R_LIBS_USER=$OAK/$USER/software/R

Then you can install R packages to your heart's content by running the R command from the terminal, then running install.packages() with the packages you want to install.