Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore grouped operations by default in status output #773

Closed
javierbg opened this issue Oct 6, 2023 · 0 comments · Fixed by #774
Closed

Restore grouped operations by default in status output #773

javierbg opened this issue Oct 6, 2023 · 0 comments · Fixed by #774

Comments

@javierbg
Copy link
Contributor

javierbg commented Oct 6, 2023

Description

Since #725 was implemented, group operations do not appear by default in the project status output. This is because the way that groups are selected with the -o option is using _gather_flow_groups. If you try to include a singleton operation along with a group operation that contains it you get an error (shown in the next section).

The reason _gather_flow_groups avoids returning overlapping groups and operations is because it is used in other places like run where, of course, you want to avoid repeating operations. I think that, when implementing #725 groups were just forgotten. _fetch_status goes through a lot of hoops to reconcile having both singleton operations and grouped operations, all of which was discussed in #547 and implemented in #593, and it would be nice to recover this default behaviour.

I see several different options for solving this:

  • Switch from using _gather_flow_groups to the groups property, making sure to only select the groups that were selected with -o, if present.
  • Seeing that the overlap is not always a problem, maybe a parameter could be added to _gather_flow_groups so as to avoid the _verify_group_compatibility check (activated by default).
  • Maybe the _verify_group_compatibility check does not really belong in _gather_flow_groups. In that case it may be just taken out and used only when necessary, although this should be evaluated by someone with more experience than me with the flow codebase.

To reproduce

Initialize a signac project with signac init.

Add a couple of jobs:

project = signac.get_project()

project.open_job({
    'a': 1
}).init()

project.open_job({
    'a': 2
}).init()

Register a couple of operations and a group:

from flow import FlowProject


opgroup = FlowProject.make_group('opgroup')

@opgroup
@FlowProject.operation
def op1(job):
    pass


@opgroup
@FlowProject.operation
def op2(job):
    pass

FlowProject().main()

Run python project.py status

Error output

The output shows this, opgroup is ignored:

Overview: 2 jobs/aggregates, 2 jobs/aggregates with eligible operations.

label
-------


operation/group      number of eligible jobs  submission status
-----------------  -------------------------  -------------------
op1                                        2  [U]: 2
op2                                        2  [U]: 2

If you try to include a singleton operation along with a group operation that contains it (e.g. python project.py status -o op1 opgroup) you get the following error:

ERROR:flow.project:Error during status update: Cannot specify groups or operations that will be included twice when using the -o/--operation option.
Use '--ignore-errors' to complete the update anyways.
Traceback (most recent call last):
  File "/home/javier/git_repos/signac-flow/mytesting/duplicate_operation_reproduction.py", line 18, in <module>
    FlowProject().main()
  File "/home/javier/git_repos/signac-flow/flow/project.py", line 5165, in main
    args.func(args)
  File "/home/javier/git_repos/signac-flow/flow/project.py", line 4799, in _main_status
    raise error
  File "/home/javier/git_repos/signac-flow/flow/project.py", line 4793, in _main_status
    self.print_status(jobs=aggregates, **args)
  File "/home/javier/git_repos/signac-flow/flow/project.py", line 3011, in print_status
    status_results, job_labels, individual_jobs = self._fetch_status(
  File "/home/javier/git_repos/signac-flow/flow/project.py", line 2724, in _fetch_status
    status_groups = set(self._gather_flow_groups(names))
  File "/home/javier/git_repos/signac-flow/flow/project.py", line 3795, in _gather_flow_groups
    raise ValueError(
ValueError: Cannot specify groups or operations that will be included twice when using the -o/--operation option.

A related bug

When this problem is solved (e.g., by switching from using _gather_flow_groups to groups in _fetch_status), a bug appears when reporting 2 or more singleton operations along with a group that contains them: only one of the operations is reported and it counts duplicate eligible/queued/etc jobs (as many duplicates as "sibling" operations). This is because in this line the operation_status dictionary is reused for each of the operations in the group and later, when a display name is assigned to it in this line, it is overwritten each time into the shared dictionary. This is easily fixable by using a copy for each one.

I decided to include this in the same issue because of the close coupling between this two issues (this bug only arises when the other functionality is restored).

System configuration

Please complete the following information:

  • Operating System [e.g. macOS]: tested in Debian 11 and Arch Linux 2023.09.01
  • Version of Python [e.g. 3.7]: 3.8
  • Version of signac [e.g. 1.0]: 2.1.0
  • Version of signac-flow: 0.26.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant