Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mamba #2

Closed
spietras opened this issue Jul 10, 2021 · 11 comments · Fixed by #22
Closed

Mamba #2

spietras opened this issue Jul 10, 2021 · 11 comments · Fixed by #22
Assignees
Labels
feature New feature or request

Comments

@spietras
Copy link
Owner

Think about using mamba instead of conda. That might speed things up.

@spietras spietras self-assigned this Jul 10, 2021
@spietras spietras added the feature New feature or request label Jul 10, 2021
@spietras
Copy link
Owner Author

It's not so straightforward, there could be problems with installation and availability on different platforms.

@GabrielDougherty
Copy link
Contributor

GabrielDougherty commented Oct 26, 2021

I suggest the following approach:

  1. Install mamba to the base environment in conda. This puts mamba at $CONDA_ROOT/condabin/mamba.
  2. If installation fails (due to a build not being available for the platform) fall back to regular conda as the output of the installation rule.
  3. Call it directly, without doing a conda activate (since that doesn't work according to your issues page)

The happy path for creating an environment (without error handling) would look like this:

"$CONDA_ROOT"/condabin/conda install mamba -n base -c conda-forge
"$CONDA_ROOT"/condabin/mamba install -f "$ENVIRONMENT_FILE"

I just made up the $CONDA_ROOT and $ENVIRONMENT_FILE variables but I believe you have an equivalent in your skylark rules. $CONDA_ROOT would be basically ctx.attrs.conda_repo + //: + ctx.attrs.conda_dir from this line:

conda_label = Label("@{}//:{}/condabin/conda{}".format(rctx.attr.conda_repo, rctx.attr.conda_dir, CONDA_EXT_MAP[get_os(rctx)]))

Let me know if you would accept a Pull Request implementing this design. Or if you would prefer a different design. Thanks.

@spietras
Copy link
Owner Author

Thanks a lot for your input!

One question (because I don't remember how things work exactly): if the user already has a global conda installation then using the local downloaded conda and installing something in the base environment will only modify the base environment in the local installation and the base environment in the global installation will remain intact, right?

@GabrielDougherty
Copy link
Contributor

Thanks a lot for your input!

Thanks for writing these rules!

if the user already has a global conda installation then using the local downloaded conda and installing something in the base environment will only modify the base environment in the local installation and the base environment in the global installation will remain intact, right?

Yes, I tested it manually just to be sure.

I have bazel aliased to bazel --output_user_root=/var/tmp/gabrield_bazel_root

$ cd path/to/my/bazel/repo/that/uses/rules_conda
$ bazel build //... # install local conda
$ conda env list # find the bazel conda root using my existing conda install
$ /var/tmp/gabrield_bazel_root/6886cfff64e6893fa4f77ca8f6b3d57a/external/conda/conda/condabin/conda install mamba -n base -c conda-forge
$ file /var/tmp/gabrield_bazel_root/6886cfff64e6893fa4f77ca8f6b3d57a/external/conda/conda/condabin/mamba
# outputs:
/var/tmp/gabrield_bazel_root/6886cfff64e6893fa4f77ca8f6b3d57a/external/conda/conda/condabin/mamba: symbolic link to `../bin/mamba'

So mamba gets installed to the bazel cache.

@spietras
Copy link
Owner Author

Alright, great, so that's not an issue.

However, I see another one: we are setting up conda in one repository rule (load_conda, and I think we will install mamba inside this rule) and creating the environment in another repository rule (conda_create). Depending on the configuration (whether its caused by the user's system or the specified configuration, because I think the user should be able to choose if he wants to use mamba or not) we will need to execute different things (conda or mamba). So now we need a way for conda_create to somehow know which executable to call.

This complicates things because I don't remember any way of passing info between rules. The only workaround that comes to my mind is making a symlink with a constant name (let's say snake for example) that will point to either conda or mamba, depending on the configuration, and expose that symlink for other rules to use. Then conda_create can simply call $CONDA_REPO/conda/condabin/snake env create ... instead of $CONDA_REPO/conda/condabin/conda env create .... Do you have any other ideas?

@GabrielDougherty
Copy link
Contributor

The only alternative I can think of is to have load_conda accept a boolean option install_mamba and for conda_create to accept a boolean option use_mamba. It would be up to the user to ensure that both options are the same value:

USE_MAMBA = True

load_conda(
    install_mamba = USE_MAMBA,
    mamba_version = "0.17.0",
    quiet = False,  # use True to hide conda output
    version = "4.10.3",  # optional, defaults to 4.10.3
)

conda_create(
    name = "my_env",
    use_mamba = USE_MAMBA,
    timeout = 600,  # each execute action can take up to 600 seconds
    clean = False,  # use True if you want to clean conda cache (less space taken, but slower subsequent builds)
    environment = "@//:environment.yml",  # label pointing to environment.yml file
    quiet = False,  # use True to hide conda output
)

Pros:

  • In the future, if multiple conda_creates are allowed, the user could disable mamba for environments that don't work well with mamba but keep using mamba for other environments
  • May be simpler to implement
  • Rules are more referentially transparent, they make it obvious that mamba is being used in conda_create

Cons:

  • The USE_MAMBA flag could get out of sync between rules and we would need to output an error

@spietras
Copy link
Owner Author

Ok, fine, let's do it that way. So:

  • load_conda will get two new parameters:
    • install_mamba (default: False) - whether or not to install mamba
    • mamba_version (default: 0.17.0) - which mamba version to install (ignored if install_mamba is False)
  • conda_create will get one new parameter:
    • use_mamba (default: False) - whether or not to use mamba to create the environment

If install_mamba is True in load_conda then we will install mamba inside the base environment (probably as the next step after _update_conda in conda.bzl). If for some reason mamba can't be installed then we should end with an error. Simply falling back to conda might be dangerous here, because if multiple people on different systems use the same WORKSPACE file (let's say they are working on the same project with a shared repository) and one of them can install mamba while the other can't, then they silently end up with de facto different environments. In that case, they should make a conscious decision not to use mamba at all.

If use_mamba is True in conda_create then we will use mamba instead of conda for creating the environment. If the user passed install_mamba=True and use_mamba=False then nothing really happens as it's just a decision to not use mamba for that particular environment. If the user passed install_mamba=False and use_mamba=True then mamba is not available and we should end with an error.

Sounds good?

@GabrielDougherty
Copy link
Contributor

Sorry for the late reply, I was sick. Yes, that sounds like a good plan. I will begin work on this sometime this week. I will open a PR when I have something to show.

@spietras
Copy link
Owner Author

spietras commented Nov 2, 2021

Sorry for the late reply, I was sick.

No problem, hope you are feeling better now. Matter of fact, I was sick too D:

I will open a PR when I have something to show.

Great, looking forward to it. No pressure though, as there is not much going on anyway, so it's done when it's done.

And while you are at it, take a look at this: mamba-org/mamba#633. It might be that we need a little workaround to benefit from mamba. Namely, we probably need to create an empty environment first and then update it using the configuration file instead of doing it in one step.

@jiawen
Copy link
Contributor

jiawen commented Nov 6, 2021

I recently discovered mamba as well and man is it fast!

Related - perhaps this should be a separate issue - any interest in adding support for installing using miniforge and/or mambaforge?

@spietras
Copy link
Owner Author

spietras commented Nov 6, 2021

This is interesting. For now, let's keep the current flow (as miniconda should be supported anyway) and move it to a separate issue to discuss it there.

@spietras spietras mentioned this issue Nov 6, 2021
GabrielDougherty added a commit to GabrielDougherty/rules_conda that referenced this issue Nov 8, 2021
@spietras spietras linked a pull request Nov 8, 2021 that will close this issue
2 tasks
@spietras spietras mentioned this issue Nov 9, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants