-
Notifications
You must be signed in to change notification settings - Fork 1
How to run parcels on Lorenz or Gemini
Lorenz is the IMAU oceanography group's server and the preferred machine for running Parcels simulations.
When working at the university, go directly to step 2.
- In a terminal, type
ssh yoursolisid@gemini.science.uu.nl
and enter your solis-password. - On gemini, type
ssh yoursolisid@lorenz.science.uu.nl
and enter your solis-password. - To disconnect, type
exit
once (at the office) or twice (at home).
Unfortunately, this login procedure is rather cumbersome, especially from home. Luckily it's possible to store most details, saving keystrokes and human memory use. Read more on these workflow improvements below.
You can automate the login procedure with the following two steps
- On your local machine, edit the file
~/.ssh/config
, create if it doesn't exist. - Add the following lines:
Host gemini
hostname gemini.science.uu.nl
user <yoursolisid>
Host lorenz
hostname lorenz.science.uu.nl
user <yoursolisid>
ProxyJump gemini
- Save file and exit.
From now on, you can connect with just ssh lorenz
. The connection will be routed through gemini automatically, thanks to ProxyJump. *NOTE: If you use an application like ncview
, you should add the following two lines to each of the Hosts listed above:
ForwardX11 yes
ForwardX11Trusted yes
If you don't want to enter your password every time, add your public ssh key to ~/.ssh/authorized_keys
on lorenz, or follow these instructions (Linux/OSX):
- On your local machine, type
ssh-keygen
(default filenameid_rsa
is OK, passphrase is optional). - Type
ssh-copy-id gemini
, enter your solis-password when asked. - Type
ssh-copy-id lorenz
, enter your solis-password when asked.
Windows users may want to use PuttyGen. Alternatively, if you use Windows 10, you can follow these instructions:
- On your local machine, open PowerShell (in Administrator mode) and type
ssh-keygen
(default filenameid_rsa
is OK, passphrase is optional). - Type
cat ~/.ssh/id_rsa.pub
and copy the output. - Type
ssh gemini
, and once connected typenano ~/.ssh/authorized_keys
(or create the file if necessary). - Paste the output from step (2).
- Repeat steps (3) and (4), replacing
gemini
withlorenz
.
If you only want to run parcels, you can do module load parcels
and you will get the latest release of parcels.
However, if you want to also use other python packages, you can best create your own conda environment To do that, follow these steps:
- On the head node (i.e., not a compute node) first load miniconda with
module load miniconda
- Then, install parcels and all its dependencies with
conda create --prefix ~/parcels_env -c conda-forge parcels
- Update your bash settings with
conda init bash
. Re-log in so that the changes to your~/.bashrc
file take effect.
The next time you log in, the only steps to take are
module load miniconda
conda activate ~/parcels_env
You could also add these last two commands in your ~/.bashrc
file so that they are automatically executed when you log in.
*NOTE: these two commands must be placed below the # <<< conda initialize <<<
block of commands.
If you want to use the developer version of parcels (because you want/need features that are not yet in a release, or because you want to fixes yourself), you can follow these steps:
- On the head node (i.e., not a compute node) first load miniconda with
module load miniconda
- Clone the parcels git repository with
git clone https://github.com/OceanParcels/parcels.git
- Go into the new parcels directory with
cd parcels
- Create a conda environment in your home directory with
conda env create -f environment.yml --prefix ~/parcels_env
- Activate the new environment with
conda activate ~/parcels_env
- Finish the installation with
pip install --no-build-isolation --no-deps -e .
Don't worry if it seems that version 0.0.0 has been installed; that is a fluke. You can check whether your developer installation works, by
- First checking that python points to the new environment:
which python
should return~/parcels_env/bin/python
- Go into the python command line (by simply calling
python
), thenimport parcels
and thenprint(parcels.__file__)
. That last command should return/storage/home/USERNAME/parcels/parcels/__init__.py
The next time you log in, the only steps to take are
module load miniconda
conda activate ~/parcels_env
You could also add these last two commands in your ~/.bashrc
file so that they are automatically executed when you log in.
This setup will give you access to the latest master
branch of parcels, and you can also pull, push and change branches with git commands in your parcels
directory.
The hydrodynamic data on lorenz is stored at /storage/shared/oceanparcels
. If you need access to more datasets there, let Erik know. See here for information how to use the MOi data, including some example code for creating a FieldSet
.
GitHub facilitates basic interaction with repositories (pull/push) over two different protocols: SSH and HTTPS. The following instructions set you up for automatic authentication for HTTPS repo's, for SSH (using ssh keys) see here.
- generate a Personal Access Token (PAS) on this page. Name it "lorenz". Regarding options, select "repo", “read:org” and “workflow". Copy the resulting token to your clipboard.
- on Lorenz, load parcels-dev if you haven't already (
module load parcels-dev
). You'll need this for thegh
command in step 3. - login using your new PAS using
gh auth login
. Choose the options Github.com, HTTPS, Github Credentials, Authentication Token. In the last step, paste the token from step 1. - From now on, authentication should be automatic (read: no passwords required) when interacting with Github HTTPS repositories, for instance the Parcels clone that you made in the preceding section.
- In case it still doesn't work, you may need to add the following lines to the file
~/.gitconfig
:
[credential "https://github.com"]
helper = !gh auth git-credential
The Lorenz cluster is composed of one main node (where you login) and several compute nodes. It is strongly preferred to do all compute-intensive or data-intensive work on a compute node. You request an interactive compute-node allocation as follows:
srun -n 1 -t 2:00:00 --pty bash -il
This will request a single CPU core (-n 1
) for 2 hours (-t
). When you're done working, please release your allocation by typing
exit
A more advanced script for requesting interactive nodes.
You can save the following script as request_interactive_node.sh
or something similar, so that you do not need to remember the above line anymore. Running bash request_interactive_node.sh
will automatically log you into a compute node. You can still request a specific partition, node or deviating runtime using the options [-p partition] [-n node_name] [-t runtime]
.
#!/bin/bash
# Default partition is "normal"
partition="normal"
# Default node name is unset
node_name=""
# Default runtime is 8 hours for "normal" partition
runtime="08:00:00"
# Parse command line arguments
while getopts "p:n:t:" opt; do
case ${opt} in
p ) partition="$OPTARG"
;;
n ) node_name="$OPTARG"
if [ "$node_name" == "node09" ]; then
partition="short"
runtime="03:00:00"
fi
;;
t ) runtime="$OPTARG"
;;
\? ) echo "Usage: request_interactive.sh [-p partition] [-n node_name] [-t runtime]"
exit 1
;;
esac
done
if [ "$partition" == "short" ]; then
echo "Requesting interactive node on short partition"
runtime="03:00:00"
node_name="node09"
else
echo "Requesting interactive node on normal partition"
fi
# Request an interactive session
if [ -z "$node_name" ]; then
srun --partition="$partition" --time="$runtime" --pty bash -il
else
srun --partition="$partition" --nodelist="$node_name" --time="$runtime" --pty bash -il
fi
See https://github.com/IMAU-oceans/Lorenz
Note that most Parcels jobs use only one core, so set #SBATCH -n 1
Another way to develop, debug and analyze your work on Lorenz is through VS Code. VS Code supports interactive Spyder-like development, debugging, Git, and Jupyter notebooks. It also has a built-in file browser. Lastly, VS Code supports many plug-ins, amongst which Github Copilot: an AI that can give code suggestions (similar to ChatGPT). GitHub Codepilot is free for students, teachers and open source developers. UU students (including PhD candidates) can apply for a Student Developer pack by sending an email to info.rdm@uu.nl. Post-docs and staff can apply for Teacher Benefits via GitHub Education. See this page for more info about GitHub @UtrechtUniversity.
To use VS Code on Lorenz:
- Install VS Code on your own computer.
- Set-up SSH authentication properly. See instructions for at the top of this page, under 'Connecting' (
Workflow improvement 1: store SSH hostnames
andWorkflow improvement 2: automated SSH authentication
). - In VS Code, install the Remote - SSH extension.
- On the bottom-left, there is now a green
remote connection
button. If you click on it, a menu pops up, where you can clickConnect to Host
. This will be used to connect to the compute nodes later. - For Python development, install the
Python
,Pylance
, andJupyter
extensions. Optionally install theGitHub Copilot
AI extension. - It is important to always use VS Code on a compute node, so that adequate resources are reserved for you, and you don't slow down the main login node. You need to add the interactive nodes as an available host to your SSH Config.
In VS Code, open the command palette (
CMD/CTRL + SHIFT + P
) and typessh config
. - Choose your configuration file (usually the first).
- Add the (short) compute nodes to your
config
file:
Host lorenz-compute
ForwardAgent yes
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
ProxyCommand ssh lorenz "/opt/slurm/bin/salloc --nodes=1 --mem=64G --cpus-per-task=1 --partition=normal --time=8:00:00 /bin/bash -c 'nc \$SLURM_NODELIST 22'"
User 1234567
Host lorenz-short
ForwardAgent yes
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
ProxyCommand ssh lorenz "/opt/slurm/bin/salloc --nodes=1 --mem=64G --cpus-per-task=1 --partition=short --time=3:00:00 /bin/bash -c 'nc \$SLURM_NODELIST 22'"
User 1234567
Make sure to choose the right user name (replace 1234567). *NOTE: If you are on a Windows machine, remove the backslash before $SLURM_NODELIST in the above config lines.
Now you can choose the compute nodes on the normal and short partition as available hosts in Lorenz. Note that in the above example, a memory restriction of 64GB is set, and you'll run on one CPU (you can change this if you need multiple CPUs).
lorenz-compute
or lorenz-short
. Never simply choose lorenz
.
If you are done using the remote connection, close it properly by clicking 'file > close remote connection'. If you don't do this, processes can linger around on the compute node. If for some reason the connection closed unexpectedly, you'll have to clean up the remaining server processes yourself:
- Open a terminal and type:
ssh lorenz
- Log into the compute node that you were on: e.g.
ssh node01
- Check if there are processes running
ps -ef | grep SOLISID
(replacingSOLISID
with your Solis-ID. - This should show if you still have running processes, which will include
.vscode-server
in them. - Kill these processes using their PID (second column), i.e.
kill 12345
. It is also possible to kill all your open processes usingkillall -u SOLISID
, but note that this will also kill processes that are not related to VS Code. So don't use this option if you have important computations running on this node.
Lorenz has a power saving feature: nodes that have been idle for a while will be turned off. In principle, the SSH configuration that we've specified above should force SLURM to automatically find a suitable node for your interactive session. If there is no powered-on node with the right resources, it will boot up a new node and redirect VS Code to it. However, VS Code is impatient and doesn't like to wait for the node to be powered on. Instead it will return a timeout
-error.
There are 2 ways to circumvent this issue:
- You can change the amount of time that VS Code will wait before returning a time-out error. This can give SLURM the time to allocate resources and boot up a node. To do so, go to settings (gear icon in left bottom), search for
ssh
and change theRemote.SSH: Connect Timeout
setting to a higher number, for instance 60 seconds. - You can also turn on a powered off node yourself, so that once VS Code tries to find a powered on interactive node, it can immediately reach it. You can turn on a node by requesting a 'dummy' interactive session. To do so, open a plain terminal and SSH to Lorenz. Check which nodes are idle or in use by using
sinfo
and/orsqueue
. Then use therequest_interactive_node.sh
script (mentioned above) to request an interactive session on a node that needs to be powered on. This script can be found above under the link "A more advanced script for requesting interactive nodes.". For instancerequest_interactive_node.sh -n node02
requests an interactive session on node02. If node02 has been powered off, requesting a session will boot it. Once you're into the node, typeexit
again to leave the interactive session and release the associated resources. Now that at least one node, node02, is powered on, you should be able to request an interactive session using VS Code again.
In case you have other issues connecting to Lorenz via VS Code, you can first check if you have too many jobs runnings. At present, the limit on the number of concurrent jobs is 3. Sometimes when VS Code instantiates a interactive job and doesn't connect correctly (patchy wifi, lost connection etc.), the job remains running for the allocated time requested. To check this, open a terminal (or PowerShell if using Windows), and type ssh lorenz
. Once connected, type squeue -u solisid
where solisid is replaced with your own solisid. If there are several interactive jobs running, you can terminate them using the command scancel jobid
where jobid is the job you want to cancel.
Running a interactive Jupyter notebook is a great way to work and can be easily done via VScode, which handles the SSH tunneling for you.
- Connect to a compute note through VScode remote SSH.
- Run
conda activate ~/parcels_env
(assuming you followed the setup instructions before), and install Jupyter Lab if you haven't already (conda install -c conda-forge jupyterlab
) -
cd
into your folder of choice and runjupyter lab
. - Click the link printed in your terminal. This will open a browser on your local machine connected to the Jupyter Lab instance on the server.
Gemini is the name of the UU ssh-server that can be used by students and staff to run simulations.
- In your terminal/command prompt, type
ssh yoursolisid@gemini.science.uu.nl
and enter your solis-password - If you don't want to do this again every time, create a config-file in ~/.ssh/
- To disconnect, type
exit
You can create a conda environment, and set up parcels within it by completing the following steps:
- Load miniconda by typing
module load miniconda/3
. Put this line into your ~/.bashrc such that it is executed automatically every time on startup. Make sure that your ~/.bash_profile sayssource $HOME/.bashrc
- git clone parcels into your home directory, i.e. go to your home directory and type
git clone https://github.com/OceanParcels/parcels.git
. You can also put it somewhere else of course, then you need to choose that path by setting the PYTHONPATH variable later - Go into the newly created parcels folder with
cd parcels
. - Install the needed environment (if it does not exist yet from previous parcels versions) by typing
conda env create -f environment.yml
. Note that you have very limited space on gemini, so you cannot have many environments. If you want to delete an old one, check out this page. - Activate the environment:
conda activate parcels
. To be able to useconda activate
you might need to first initialize the conda commands by typingconda init bash
. - Still in the parcels directory, type
pip install --no-build-isolation --no-deps -e .
parcels is now set up. To run it, always activate the parcels environment first with conda activate parcels
You can run your codes with python just as usual: python your_file.py. However, this will run the file on the front node, which is not good practice. You can use it for a very short test, but not for a large run. For larger runs, submit a job with qsub (http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html), see this
example. To submit a job, use qsub -V examplejob.sh
(the -V
is necessary!).
Sometimes, you may need to adapt the parameters of qsub to run larger experiments. A more advanced script can look like this:
#!/bin/bash
# SGE Options
#$ -S /bin/bash
# Shell environment forwarding
#$ -V
# Job Name
#$ -N <YOUR_EXPERIMENT_NAME_HERE>
# Notifications
#$ -M <YOUR_UU_MAIL_ADDRESS_HERE>
# When notified (b : begin, e : end, s : error)
#$ -m es
# Set memory limit (important when you a lot of field- or particle data(
# Guideline: try with 20G - if you get a mail with subject "Job <some_number_here> (<EXPERIMENT_NAME>) Aborted",
# then check the output line "Max vmem". If that is bigger that what you typed in here, your experiment ran
# out of memory. In that case, raise this number a bit.
# (1) 'h_vmem' cannot be >256G (hard limit); (2) Do not start with with 200G or more - remember that you
# share this computer with your next-door-neighbour and colleague.
#$ -l h_vmem=20G
# Set runtime limit
#$ -l h_rt=24:00:00
# run the job on the queue for long-running processes: $ -q long.q
echo 'Running Stommel example ...'
cd ${HOME}/parcels/parcels/examples
python3 example_stommel.py
echo 'Finished computation.'
You can put this directly into a shell script, e.g. experiment.sh
, replace python example_stommel.py
with your own experiment and the placeholders with your own information, and then run it via qsub -V experiment.sh
.
Warning for windows users: If you create a shell script on your own windows device, the job subscription might return an error starting with "failed searching requested shell because:". If so, make sure the line endings in your shell script do not include the "^M" character. To check this you can use vim as described here and solve the problem with the answers here
Once a job is submitted you can check for warnings and errors in the file .e where the job ID is the number you get to see when submitting a job. Note that there is an unsolved issue with bash that always leads to the following error:
/bin/bash: module: line 1: syntax error: unexpected end of file
/bin/bash: error importing function definition for `BASH_FUNC_module'
This should not cause any problems for your python script however, so you can simply ignore it.
-
qstat
displays the jobs you have submitted -
qdel 123
to kill the job with id 123 (you can see the id when using qstat). You can of course only kill your own jobs.
Our data is stored in /data/oceanparcels/input_data
. If you encounter an error due to not having permission to access the data, contact e.vansebille@uu.nl
You have a maximum of 15 GB of space in your home directory on gemini. To check how much storage you have used, you can type du -s
in your home directory. If you write large files, write them to your scratch directory: /scratch/yourname
. If this directory does not exist, create it with mkdir /scratch/yourname
. The files in your scratch folder may be removed after a few weeks of inactivity. For long term storage of output files use /data/oceanparcels/output_data/data_<YOUR_NAME>/
.
Use scp
or rsync
. E.g. type on your local computer scp wichm003@gemini.science.uu.nl:/home/staff/wichm003/file ./
to copy the file file
to your computer.
There are two nodes available on gemini, called science-bs35 and science-bs36. After connecting to gemini, you will be logged in to bs35. To switch to bs36, type ssh science-bs36
, and exit
to go back. Importantly, you have different scratch-directories on both nodes. If you just submit your job as explained above, either one of the nodes could be used, so check both scratch-directories if you are missing output files. If you want to submit to a specific node, use the -l hostname=<nodename>
option: qsub -l hostname=science-bs36 -V TestSubmit.sh
. Warning: The science-bs35 node does not execute scripts with the option -q long.q. If you use the parcels module, note that it might be necessary to load the parcels module from the same terminal window on the node you want to execute from. Also note that the parcels versions on both nodes are not necessarily the same. Check this beforehand - it might lead to errors otherwise!
There are two options to work with parcels on gemini: with your local parcels version or with the parcels module available on gemini. In either case, jobs need to be submitted with the -V
option: qsub -V jobscript.sh
.
- Use the parcels module available on gemini. Type (or add to your .bashrc) the command
module load parcels
. To display the available modules, typemodule avail
. Other options for modules are displayed after logging in to gemini. - Use your local version, i.e. just as on your own computer. In order to do that, you need to add the parcels directory to your PYTHONPATH in your ~/.bashrc:
export PYTHONPATH="$PYTHONPATH:/home/staff/wichm003/parcels"
.
Using Jupyter with Gemini can easily be done with VScode, as detailed above however this time you connect to Gemini instead of a Lorenz compute node.
Bitvise Client is a program that helps to access gemini very easily on windows. It has a file browser and a terminal. Use host: gemini.science.uu.nl
and port: 22
and log in with your solis-id password.
If you have problems, ask the other people in the group. If they don't know, ask Carel van der Werf.
The Cartesius cluster no longer exists, the instructions are kept for reference only.
Toggle instructions
- You need an account. Ask the PI of your project to ask for one with an E-Mail to helpdesk@surfsara.nl.
- Connect via
ssh yourcartesiusname@cartesius.surfsara.nl
Use the sbatch
command for job submission. There is a very good documentation available here: https://userinfo.surfsara.nl/systems/cartesius/usage. Read the important parts about the number of cores per node and the parameters that should be in your script.
- This is similar to gemini, but you load anaconda in the following way (put it in your ~/.bashrc):
module load 2019
module load Anaconda3/2018.12
- The rest is completely similar to gemini.
- There are many modules available on cartesius, which have to be loaded manually. For example, type
module load nco
. The documentation of surfsara is very good! If you miss a module, just google how to load it on cartesius. - Note that ncdump will work within your parcels environment
There are two scratch directories: local and shared. You can create your own directory in the scratches as explained here: https://userinfo.surfsara.nl/systems/cartesius/filesystems. If you write data to scratch, write it to scratch-shared/yourname (not to scratch-local, it does not work).
So far parcels can be run only in serial mode. Through the compilation there are some files created in your local scratch for each execution of parcels. It was found out that there can be conflicts at the execution of several runs. This is solved by putting a sleep
command between the different executions. In this example, we do five runs, each at a different core (note the &
at the end of the lines).
Our data is stored at /projects/0/topios
.
Surfsara has a very good helpdesk where you receive an answer usually within a few hours: helpdesk@surfsara.nl.