Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for using mamba with conda #2030

Merged
merged 11 commits into from
May 14, 2021

Conversation

abhi18av
Copy link
Member

@abhi18av abhi18av commented Apr 11, 2021

Initiating the work for mamba support in conda.

Planned features are

A mamba directive with micromamba option to users to use micromamba which doesn't rely on a base environment - see here

We have decided to implement this as a config setting for conda scope.

conda {
   useMamba = true
}

@abhi18av abhi18av force-pushed the abhinav/mamba-support branch from f99e9fd to 091c5ad Compare April 11, 2021 23:56
@abhi18av abhi18av self-assigned this Apr 12, 2021
@pditommaso
Copy link
Member

Is mamba cli compatible with conda cli? In that case would not make sense just to use it in place of conda in the current implementation (with an option to use maba instead of conda?)

if( isYamlFilePath(condaEnv) ) {
cmd = "conda env create --prefix ${Escape.path(prefixPath)} --file ${Escape.path(makeAbsolute(condaEnv))}"
}
else if( isTextFilePath(condaEnv) ) {
cmd = "conda create $opts--mkdir --yes --quiet --prefix ${Escape.path(prefixPath)} --file ${Escape.path(makeAbsolute(condaEnv))}"
}
else {
cmd = "conda create $opts--mkdir --yes --quiet --prefix ${Escape.path(prefixPath)} $condaEnv"
}

@abhi18av
Copy link
Member Author

Yes Paolo, mamba CLI aims to be the superset of conda CLI.

I've put forward a couple of approaches to adding mamba support earlier in this comment here #1819 (comment)

Please let me know which approach makes more sense to be used for the integration :)

@phue
Copy link
Contributor

phue commented Apr 12, 2021

It seems that mamba is CLI-compatible to conda, but micromamba is not. This probably makes the proposed condaBin implementation a bit tricky.

Also note that mamba env create currently falls back to conda to create the environment (See here: mamba-org/mamba#633)
This means that it would speed up the solving part, but would not fix #1819 because downloading and caching would still be handled by conda.

However, micromamba could be used for creating environments from yaml files using something like this (same syntax as for textfiles. Note that there is no mamba env command):

        if( isYamlFilePath(mambaEnv) ) {
            cmd = "micromamba create -p ${Escape.path(prefixPath)} -f ${Escape.path(makeAbsolute(mambaEnv))}"
        }

@abhi18av
Copy link
Member Author

It seems that mamba is CLI-compatible to conda, but micromamba is not. This probably makes the proposed condaBin implementation a bit tricky.

Yes, the direction we have decided to take is to implement it as a directive which allows enough space to support sibling projects of mamba such as quetz and boa in future.

Without the micromamba: true option, mamba would fall back to the conda binary by default.

process foo {
  mamba 'bwa samtools multiqc', micromamba: true

  '''
  your_command --here
  '''
}

Thoughts?

@phue
Copy link
Contributor

phue commented Apr 14, 2021

I think that is a great solution 👍

@pditommaso
Copy link
Member

I'm not convived by this approach that's introducing a new mamba specific feature and al related options and vars eg -with-mamba, NXF_MAMBA_CACHEDIR, mamba directive etc.

Since mamba is a drop-in replacement for Conda as a user I would expect to be able to use the usual Conda option and just use Mamba binary in place of conda behind the scene.

I think we need to introce a useMamba option in the conda context and when set replace conda command lines with the corresponding mamba ones.

@drpatelh
Copy link
Contributor

I think we need to introce a useMamba option in the conda context and when set replace conda command lines with the corresponding mamba ones.

Yep, I agree. If possible, recycling the current Conda parameters and env variables would be much cleaner with the hope that maybe when this stabilises a little we can maybe just set useMamba: true by default?

It also means we don't need to add more syntactic clutter to module files here specifically for mamba because useMamba can be set via a config if required.

@pditommaso
Copy link
Member

Yes, agree all apart from having useMamba: true by default. I think it should be an opt-in feature.

@drpatelh
Copy link
Contributor

I think it should be an opt-in feature.

Given how slow Conda is I think you would have to have way too much time on your hands not to opt in to using mamba by default 😅

@abhi18av
Copy link
Member Author

abhi18av commented Apr 19, 2021

I think we need to introce a useMamba option in the conda context and when set replace conda command lines with the corresponding mamba ones.

Cool, I'll refactor accordingly 👍

@abhi18av abhi18av changed the title Add support for mamba Add support for using mamba with conda Apr 19, 2021
@abhi18av abhi18av force-pushed the abhinav/mamba-support branch from a4adb89 to 43a9a70 Compare May 5, 2021 12:41
@abhi18av
Copy link
Member Author

abhi18av commented May 5, 2021

Okay so I've refactored the PR code, to rely upon the useMamba option for the conda directive.

One place I need some guidance is how should I accomodate this option within the BashWrapperBuilder, I tried to find a way to derive it via TaskBean / TaskProcessor / TaskRun classes but it felt like retrofitting.

Thoughts?

@abhi18av abhi18av force-pushed the abhinav/mamba-support branch 3 times, most recently from f021153 to 6bc4b4b Compare May 12, 2021 12:26
@abhi18av abhi18av marked this pull request as ready for review May 12, 2021 18:28
@abhi18av
Copy link
Member Author

abhi18av commented May 12, 2021

Update:

Able to create the environments using mamba.

conda {
   useMamba = true
}

Please let me know your thoughts on the current solution.

@@ -209,20 +215,20 @@ class CondaCache {
* @return the conda environment prefix {@link Path}
*/
@PackageScope
Path createLocalCondaEnv(String condaEnv) {
Path createLocalCondaEnv(String condaEnv, String binaryName = "conda") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep binaryName as a class attribute instead of passing as argument? or both conda and mambacan be used in the same class instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep binaryName as a class attribute instead of passing as argument?

Hmm, do you think that it's better to do use the following at class level

        private useMamba = false
        private binaryName = useMamba ? "mamba" : "conda"

and then simply use the binaryName within the methods instead of argument?

or both conda and mamba can be used in the same class instance?

I am not sure how to interpret that, but from what I understand, it's enough to rely on mamba (i.e. a single binary) CLI since it'll automatically locate conda if needed, without accomodating this in the Groovy code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do you think that it's better to do use the following at class level

Let's have a binaryName default to conda and switch to mamba when the flag is enabled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have accomodated the change request :)

Comment on lines 75 to 78
@PackageScope String binaryName() {
useMamba ? "mamba" : "conda"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming this to getBinaryName it becomes a synthetic attribute that you can access as binaryName instead of binaryName()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - thanks for the tip 😊

@@ -60,6 +60,8 @@ class CondaCache {

private String createOptions

private Boolean useMamba = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, null is useless in this case better private boolean useMamba (false is default)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good to know!

I've refactored accordingly.

@@ -90,6 +96,10 @@ class CondaCache {

if( config.cacheDir )
configCacheDir0 = (config.cacheDir as Path).toAbsolutePath()

if( config.useMamba )
useMamba = config.useMamba as Boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean instead of Boolean to match the type declaration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy that 👍

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we are almost there. Have tried it? also it should be added a note in the docs for conda, explaining this functionality, the benefits, how to use it and that's an experimental feature.

Please also sign the commit to make the DCO both green. See the contribution readme for details.

abhi18av added 4 commits May 14, 2021 12:18
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
abhi18av added 5 commits May 14, 2021 12:18
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
@abhi18av abhi18av force-pushed the abhinav/mamba-support branch from 3ced9ed to 4dd48a3 Compare May 14, 2021 10:18
@abhi18av
Copy link
Member Author

abhi18av commented May 14, 2021

Okay, so here's the current status

  • I have updated the docs ✅
  • tested locally with nextflow-io/rnaseq-nf

Summary:

The user time

  • without mamba is 2m30.046s
  • with mamba is 1m36.336s

Details:

On my humble local machine, the results without useMamba is

(base) Abhinavs-MacBook-Pro:rnaseq-nf eklavya$ time ../../launch.sh main.nf -profile conda
N E X T F L O W  ~  version 21.05.0-edge
Launching `main.nf` [adoring_brahmagupta] - revision: 4ab4121ede
 R N A S E Q - N F   P I P E L I N E
 ===================================
 transcriptome: /Users/eklavya/projects/code/nextflow/_resources/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa
 reads        : /Users/eklavya/projects/code/nextflow/_resources/rnaseq-nf/data/ggal/*_{1,2}.fq
 outdir       : results

[-        ] process > RNASEQ:INDEX  -
executor >  local (6)
[c1/99de7b] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔
[7c/381b31] process > RNASEQ:FASTQC (FASTQC on ggal_liver)    [100%] 2 of 2 ✔
[3b/b06222] process > RNASEQ:QUANT (ggal_gut)                 [100%] 2 of 2 ✔
[94/569293] process > MULTIQC                                 [100%] 1 of 1 ✔

Done! Open the following report in your browser --> results/multiqc_report.html

Completed at: 14-May-2021 12:10:10
Duration    : 2m 52s
CPU hours   : (a few seconds)
Succeeded   : 6



real	3m6.182s
user	2m30.046s
sys	0m22.124s

The result with mamba is shown below


(base) Abhinavs-MacBook-Pro:rnaseq-nf eklavya$ time ../../launch.sh main.nf -profile mamba
N E X T F L O W  ~  version 21.05.0-edge
Launching `main.nf` [compassionate_colden] - revision: 4ab4121ede
 R N A S E Q - N F   P I P E L I N E
 ===================================
 transcriptome: /Users/eklavya/projects/code/nextflow/_resources/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa
 reads        : /Users/eklavya/projects/code/nextflow/_resources/rnaseq-nf/data/ggal/*_{1,2}.fq
 outdir       : results

[-        ] process > RNASEQ:INDEX  -
executor >  local (6)
[9b/a2900c] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔
[1f/b4e9cc] process > RNASEQ:FASTQC (FASTQC on ggal_gut)      [100%] 2 of 2 ✔
[59/d564f1] process > RNASEQ:QUANT (ggal_gut)                 [100%] 2 of 2 ✔
[0e/3bd045] process > MULTIQC                                 [100%] 1 of 1 ✔

Done! Open the following report in your browser --> results/multiqc_report.html

Completed at: 14-May-2021 12:14:11
Duration    : 2m 51s
CPU hours   : (a few seconds)
Succeeded   : 6



real	3m5.407s
user	1m36.336s
sys	0m24.346s

``

@abhi18av abhi18av linked an issue May 14, 2021 that may be closed by this pull request
abhi18av added 2 commits May 14, 2021 13:43
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
Signed-off-by: Abhinav Sharma <abhi18av@gmail.com>
Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! nice contribution @abhi18av!

@pditommaso pditommaso merged commit 10c8385 into nextflow-io:master May 14, 2021
@abhi18av abhi18av deleted the abhinav/mamba-support branch January 27, 2022 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure resolving conda envs in parallel
4 participants