Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the Code data plugin #5510

Merged
merged 7 commits into from
May 18, 2022
Merged

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented May 2, 2022

Fixes #5509
Fixes #4088

The original design of the Code plugin, by using a single class to represent two different types of codes, has often caused a lot of confusion for users, but also made maintenance complicated. Here I refactor it in a fully backwards-compatible way into two separate classes: RemoteCode and LocalCode, that implement the AbstractCode abstract class. This refactoring makes the code a lot cleaner and easier to understand/use. It also allows making the generating of CLI commands to create new instances fully dynamic. This will make the commands a lot easier to maintain and to use for users. Especially now that we are considering adding other types of codes, such as containerized codes (see #5507)

@ltalirz
Copy link
Member

ltalirz commented May 4, 2022

I've only read through the commit messages so far, but this looks great to me - thanks a lot!

Given that this is a diff of >1k lines, are there any particular open questions that you would like people to review?

From my side, one question would be whether we deprecate the local code as well as, to my knowledge, nobody is using it.

@sphuber
Copy link
Contributor Author

sphuber commented May 4, 2022

There aren't any particular questions that I have. The new interface for users would look as follows:

LocalCode = DataFactory('core.code.local')
RemoteCode = DataFactory('core.code.remote')
remote = RemoteCode(computer=computer, filepath_executable='/usr/bin/bash')
local = LocalCode(filepath_files='/some/folder/code', filepath_executable='program.exe')

from the command line that would be

verdi code create core.code.remote
verdi code create core.code.local

I have named it verdi code create because there is no way to elegantly reuse verdi code setup without breaking it. We might still want to consider to just repurpose the setup command since it is just the CLI that we would be breaking. Since each AbstractCode implementation now has its own CLI endpoint (dynamically generated) there are no longer options that are just there to determine which options actually need to be filled, like --on-computer/--store-in-db.

Given that the LocalCode is now fully decoupled and is no longer complicating things, I would actually keep it around for now. It no longer complicates the code base or its interface.

@sphuber
Copy link
Contributor Author

sphuber commented May 5, 2022

Just discussed with @unkcpz on how this PR can be used to simplify the containerized code implementation and we realized that it might be a good moment to rename the concept of "remote" and "local" codes, since they always have been confusing. After a bit of brainstorming, we came up with the following names:

  • RemoteCode -> PreInstalledCode
  • LocalCode -> PortableCode

We think this indicates the different at more clearly. It would also work well with the containerized versions:

  • PreInstalledContainerCode
  • PortableContainerCode

For the former, a pre-existing container with the target executable contained within it, will be run on a Computer. The latter will take a folder from the local file system and upload it to AiiDA's storage, and when run, the code will be uploaded to the target computer and run within the specified container.

What do you think of the names pre-installed and portable to convey the semantic difference? Any other suggestions?

@ltalirz
Copy link
Member

ltalirz commented May 5, 2022

I wonder whether we can find a more compact name for the PreInstalledCode since this is the default for the vast majority of use cases today. Can one make it work to keep the name Code for this case or is this impossible?

PortableCode sounds good to me.

For containerized codes, one could use ContainerizedCode over ContainerCode but that's a question of taste.

@sphuber
Copy link
Contributor Author

sphuber commented May 5, 2022

Can one make it work to keep the name Code for this case or is this impossible?

Probably not if we want to the current one working in a backwards-compatible way. We could call the class Code and just put it in a different module. We just won't be able to bubble it up to aiida.orm since that would then class with the legacy Code class.

Besides that, the entry point name would also be problematic since core.code is already taken by the legacy Code and for backwards compatibility cannot hijack it. It might not be ideal if the class name and entry point are not consistent, i.e. having Code | core.code.something but PortableCode | core.code.portable.

For containerized codes, one could use ContainerizedCode over ContainerCode but that's a question of taste.

Containerized would also be fine for me.

@ltalirz
Copy link
Member

ltalirz commented May 9, 2022

Maybe InstalledCode would suffice already.

I still would prefer Code, even if it is less consistent, but I understand that it can cause problems with trying to maintain backwards compatibility in this case.
Please go ahead with the way you consider best.

@sphuber sphuber force-pushed the fix/5509/refactor-code branch from 5fc6ecc to 77f791e Compare May 13, 2022 18:08
Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sphuber thanks a lot! I have only some minor requests and some questions. The logic is all very clear to me.

If all is good, we can also consider to update the documentation and CI test to use verdi code create.

aiida/engine/processes/calcjobs/calcjob.py Outdated Show resolved Hide resolved
aiida/orm/nodes/data/code/abstract.py Outdated Show resolved Hide resolved
aiida/orm/nodes/data/code/local.py Outdated Show resolved Hide resolved
aiida/orm/nodes/data/code/local.py Outdated Show resolved Hide resolved
aiida/orm/nodes/data/code/remote.py Outdated Show resolved Hide resolved

table.append(['Prepend text', code.get_prepend_text()])
table.append(['Append text', code.get_append_text()])
table.append(['Default plugin', code.default_calc_job_plugin])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this means less code information is shown by verdi code show? Can we dynamically show depending on code type or from cli option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I removed it for now because it was hard-coding a bunch of stuff. But you are right, I think we might be able to get this behavior dynamically using the cli option definitions. There are some caveats though. As long as the cli options are stored as simple attributes, the automatic introspection may fail if there are more complex data types that are being returned, or they don't have a direct property to access it.

aiida/cmdline/groups/dynamic.py Outdated Show resolved Hide resolved
aiida/cmdline/groups/dynamic.py Outdated Show resolved Hide resolved
aiida/cmdline/groups/dynamic.py Outdated Show resolved Hide resolved
docs/source/reference/command_line.rst Show resolved Hide resolved
@sphuber sphuber force-pushed the fix/5509/refactor-code branch 4 times, most recently from 0085950 to a3068b0 Compare May 16, 2022 21:36
sphuber added 6 commits May 17, 2022 12:07
Historically, the `Code` data plugin came in two flavors: codes that
represent an executable on an associated `Computer`, and codes that were
included in the repository of the node instance itself. For the latter,
when used in a `CalcJob` its files would be uploaded to the remote
working directory.

The use of a single class for these two different types of entities has
caused quite a bit of confusion, not only for end-users, but also for
developers as it leads to convoluted code.

Here, we add two new data plugins, `InstalledCode` and `PortableCode` that
are designed to implement the exact same behavior, but do so in a more
user-friendly and maintainable way by having them as separate classes.
The both subclass from `AbstractCode` which provides the functionality
that is shared between them.

For backwards compatibility, the original `Code` implementation is kept,
but it is moved to the `aiida.orm.nodes.data.code.legacy` module. The
idea is that this class is deprecated and users are encouraged to start
using the new classes. In a future version, the `Code` class will be
decommissioned and an automatic migration will be included to migrate
its existing entries to a `InstalledCode` or `PortableCode` instance.
The `AbstractCode` class implements the `get_cli_options` method which
will return an `OrderedDict` of the dictionary returned by the
`_get_cli_options`. This dict contains a spec definition of the CLI
options a command should have to construct an instance of the relevant
data plugin class.

The `InstalleCode` and `PortableCode` add their specific options to this
by overriding the `_get_cli_options` method. With this approach, the
`verdi` CLI could dynamically add commands that allow to create their
instances by discovering them through their entry points and building up
the command options by the spec returned by `get_cli_options`.

Each entry in the dictionary returned by `get_cli_options` will
essentially be passed to the `click.option` decorator, so its keys
should match the spec of that method.
This custom implementation of `click.Group` will allow a CLI command to
dynamically generate its subcommands. The subcommands will be based on the
installed entry points for a specific entry point group. Optionally, the
entry points can be filtered using a regex pattern.

This approach was already in use to dynamically provide the commands to
configure computers with the various transport type plugins. The
functionality is generalized and added as a single class to the
`aiida.cmdline.groups.dynamic` module.

The code base already had one other implementation of `Group`, namely the
`VerdiCommandGroup`. For consistency, this is now also moved from
`aiida.cmdline.commands.cmd_verdi` to `aiida.cmdline.groups.verdi`.
This is the successor of `verdi code setup`. The difference being that
where `verdi code setup` hardcoded the options and was designed to setup
both remote and local codes, `verdi code create` is a command group
that dynamically loads subcommands for each installed entry point that
is a subclass of the `AbstractCode` data plugin.

The advantage of the new approach is that it is much clearer for both
users as well as developers. The code and user interface for the code
setup command was getting quite complicated because a single command was
being used for two different purposes. By decoupling them, the interface
is much more intuitive and the code is much more easy to maintain.
The `Code` plugin is used a lot and so cannot just be removed. Instead,
we keep it for now and have it subclass the new `AbstractCode`. This
way, the class already has the interface that it will have once the
deprecated interface is removed. This will allow users to start updating
their use of it in a backwards-compatible manner.

The codebase is updated to replace the use of the deprecated interface
with that of the new interface. No new instances of `Code` are created,
only instances of the new `RemoteCode` and `LocalCode`.

When the time comes to drop the legacy `Code` class, all that is needed
is a data migration that will update existing instances to either an
instance of `InstalledCode` or `PortableCode` depending on the attributes
that the instance defines.

The legacy methods and properties of `Code` plugin are deprecated.
The CLI command is replaced with `verdi code create`. The `CodeBuilder`
is no longer necessary because code instances can now easily be created
through the constructor.
@sphuber sphuber force-pushed the fix/5509/refactor-code branch from a3068b0 to a8e504e Compare May 17, 2022 10:43
It also provides some historical context as to why it replaces the old
`Code` and when that will be removed.
@sphuber sphuber force-pushed the fix/5509/refactor-code branch from a8e504e to 2d3cf68 Compare May 17, 2022 10:51
@sphuber sphuber requested a review from unkcpz May 17, 2022 10:53
@sphuber
Copy link
Contributor Author

sphuber commented May 17, 2022

Thanks for the review @unkcpz . Comments are addressed and I have added documentation and tests. Have also performed the renaming to InstalledCode and PortableCode.

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sphuber thanks! All good to me.

@sphuber sphuber merged commit bb59a70 into aiidateam:main May 18, 2022
@sphuber sphuber deleted the fix/5509/refactor-code branch May 18, 2022 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor the Code class to have separate classes for remote and local codes Code class not being tested
3 participants