Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient scheduling of large ensembles #107

Closed
cylc opened this issue Sep 4, 2012 · 4 comments
Closed

More efficient scheduling of large ensembles #107

cylc opened this issue Sep 4, 2012 · 4 comments

Comments

@cylc
Copy link
Collaborator

cylc commented Sep 4, 2012

This is just an idea for consideration. See also #108.

Currently if a "family trigger" is used in the graph to trigger a large ensemble, the family trigger is replaced with the equivalent expression involving all the ensemble members, and at run time every member is represented by its own dedicated task proxy. The efficiency of the scheduling algorithm depends on the size of the task pool, and large ensembles are the most likely cause of task pool bloat. If all ensemble members trigger at once, and if downstream tasks trigger off the entire ensemble, then the suite would be just as well served by a single task proxy representing the entire ensemble. The family proxy would have to take messages from each of its members, keep track of each of their states, know how to submit all of them at once, and so on. This would result in a massive performance boost for very large ensemble suites.

@cylc
Copy link
Collaborator Author

cylc commented Sep 4, 2012

Complications:

  1. if an individual member is singled out in a non-family dependency relationship, it could be removed from family control and given its own task proxy. Or perhaps the family proxy could also manage these relationships.

  2. how to display these aggregate families in gcontrol etc. - perhaps clicking on a family would just bring up a summary of the member states.

  3. what about cylc's internal queues, which work with individual tasks?

  4. what about internal dependence among ensemble members?

@cylc
Copy link
Collaborator Author

cylc commented Sep 5, 2012

Here's an idea - keep the individual member proxies but have them interact only amongst themselves in a separate pool (to handle any internal dependence) while they are represented in the main pool by a single family proxy.

@hjoliver
Copy link
Member

hjoliver commented Jun 15, 2016

It was suggested in @cylc/core meeting, that dependency matching could work with shared "dependency objects" rather than individual task proxies. Members of graphed families would automatically share the same dependency objects - would probably solve this issue and #1776 in one whack (if the same shared objects could be used to define graph edges). Bumping up to 'soon' on that basis...

@hjoliver hjoliver modified the milestones: soon, later Jun 15, 2016
@hjoliver
Copy link
Member

The new idea (prev comment) is superior - closing this and moving that to a new Issue.

@hjoliver hjoliver removed this from the soon milestone Jun 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant