-
Notifications
You must be signed in to change notification settings - Fork 7
8. FAQ
Here you can find the official tutorial: https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html
This is the link to the NBIS reproducible research workshop, with a Snakemake tutorial: https://nbis-reproducible-research.readthedocs.io/en/latest/snakemake/
The pipeline runs for a very long time, so it is necessary to send the Snakemake process to the background so that you don't have to keep the terminal open until it is finished. To do that, you can use terminal multiplexers such as tmux or screen. Here is a link to a crash-course in tmux: https://robots.thoughtbot.com/a-tmux-crash-course
To run GenErode, you actually only need the very basic commands:
- check which sessions are currently running:
$ tmux ls
- start a new session with the name "mysession":
$ tmux new -s mysession
- detach from a running session: type
CTRL + b
,d
- re-attach to the session:
$ tmux a -t mysession
- kill a session:
$ tmux kill-session -t mysession
If the pipeline run failed, you will get an error message in the log file (i.e. the standard output that we redirect to a file) that reads like this: "Error in rule XYZ". Each rule generates a log file that you can find in the directory results/logs
and the subdirectories therein. If you are using a system with the slurm workload manager, you will find the slurm job ID a few lines below that that corresponds to the job run on the cluster and the path to the slurm file "slurm-1234567.out", where you will hopefully find another error message explaining why the job failed.
I changed a metadata table and now GenErode attempts to rerun everything from the start. How do I rerun GenErode only for the new samples, or only for the set of samples remaining in the metadata table and for the remaining rules?
Snakemake has changed their rerun behaviour in Snakemake version 7.8 (see https://github.com/snakemake/snakemake/issues/1694). This means that when changing metadata tables, Snakemake will now run everything from the beginning, stating "Set of input files has changed since last execution". To get around this, use --rerun-triggers mtime
in the Snakemake command when starting the pipeline from the command line. This also applies to any local changes in code or other parameters.
- Adjust the number of cores under
set-threads
and underset-resources
andcpus_per_task
- Adjust the memory under
set-resources
andmem_mb
- Adjust the duration under
set-resources
andruntime
Please note that in several cases, rules were grouped together to be run as one job on the cluster. In that case, you need to adjust the parameters for the entire group ID in the file
slurm/config.yaml
.
Some rules (in the
.smk
files within theworkflow/rules/
directory) have a default number of threads specified underthreads
that corresponds to the number underset-threads
in theslurm/config.yaml
file. If you change this number in theslurm/config.yaml
file, the number ofthreads
should be adjusted automatically.
Each step of GenErode depends on most previous steps (except the mitogenome mapping, which depends on the fastq file processing, but is not automatically loaded for the subsequent steps). The pipeline is written in a way so that all required steps are automatically included if you set a step to True
in the config file. If you set several steps to True
at the same time, Snakemake therefore tries to include the same steps multiple times and throws this warning message.
I want to rerun the pipeline with changed parameters settings, but I get the message "nothing to be done". How do I fix that?
GenErode checks the presence of the final output file of each step to decide if it should rerun the analyses. For most steps of the pipeline, these are the output files of the MultiQC analysis. You can find them in the stats
directory of the step you were running, and therein. Deleting, moving or renaming these files will force the pipeline to rerun the analyses leading to these files, using the parameters specified in the config file.
For downstream analyses (mlRho, PCA, ROH, snpEff, gerp), delete, move or rename the final output files (tables, figures) to trigger a rerun (see next question).
Alternatively, you can add the flag -R path/to/file.out
to the Snakemake command to start the pipeline, with path/to/file.out
being the file you want to re-create.
Is it possible to rerun the pipeline with a different optional filtering step in the same directory, or will it overwrite everything? For example, would the output from a run without subsampling be overwritten when rerunning the pipeline with subsampling?
Subsampled BAM files, the resulting mlRho output and BCF files per individual have different file names than the same files without subsampling, so a rerun would not overwrite the not-subsampled files. The merged BCF files and all downstream files, however, have the same file name for any individual filtering, so they would be overwritten. If it is important to keep both versions, please rename the file that should be protected from overwriting before rerunning the pipeline (or move it to a new directory).
I want to keep intermediate files that would be otherwise automatically deleted by Snakemake (marked as "temporary"). How do I do that?
This is only recommended when you have double checked that you have enough storage space to keep intermediate files as GenErode is creating a very large number of (large) files. Also, please remove them as soon as you don't need them anymore. If you are really sure you want to prevent intermediate files from being deleted, run the pipeline from the command line, adding the flag --notemp
.
Once you want to remove all temporary files at once, you can start a run with the additional flag --delete-temp-output
. It is recommended to do a dry run first to see which files will be deleted.
Snakemake tells me that the working directory is locked by another Snakemake process. I've tried to run --unlock
but the error message remains.
GenErode is written in a way that it expects the config file to be config/config.yaml
and can't unlock the working directory if you saved it under a different name. To unlock the working directory, type snakemake --unlock --cores 1 --configfile config/my_config.yaml
(or the file name you chose).
This seemed to be a bug related to certain browsers in GenErode versions prior to 0.5.0. When trying to access the MultiQC files from a report downloaded to a Mac, this happened with Chrome and Firefox, but it worked with Safari.