Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cylc dependencies: report prerequisites that cannot be met #1585

Closed
arjclark opened this issue Aug 28, 2015 · 7 comments
Closed

cylc dependencies: report prerequisites that cannot be met #1585

arjclark opened this issue Aug 28, 2015 · 7 comments
Assignees
Milestone

Comments

@arjclark
Copy link
Contributor

Recently we've seen a number of situations where combinations of suite changes with warm starts and restarts have meant that suites have stalled due to prerequisites that cannot be met given the contents of the task pool i.e. missing tasks.

It would be useful to be able to highlight these dependencies in some way.

One suggestion, based on a user request, would be to have a command line tool that listed all the currently unfulfillable dependencies for tasks in the pool. There should also be some form of highlighting in the gui via the view prerequisites dialog.

What may prove difficult is avoiding false positives in the returned values.

For example, it's easy enough to inspect a task pool in this simple situation:

[[R/PT6H]]
    graph = foo[-PT6H] => bar

as all we have to do is look in the task pool at bar, identify that it's looking for foo at -PT6H, and check the task pool for its presence.

However, this sort of thing is slightly more difficult:

[[R/PT6H]]
    graph = foo[-PT6H] => bar

[[R/P1D]]
    graph = foo[-PT6H] => baz

as baz will be in the pool before foo at -PT6H relative to baz is added.

Even more complex, we've seen suites where there is graphing of this form:

[[T06, T18]]
    graph = foo[-PT12H] => foo

[[T00]]
    graph = foo[-P1D] => foo

where identically named tasks are being run but in the triggering of different models grouped together in a suite. In this case, this makes it even more difficult as simply checking for presence of foo in the task pool at an earlier cyclepoint would fail to initially highlight a problem until cycling had got sufficiently far on.

Finally, there is also this sort of situation:

[[R/PT6H]]
    graph = foo[-PT6H] => foo
[[R1/<final cycle point>]] # Sorry can't remember the nice syntax!
    graph = foo => final_task

where final_task could be hanging around for a long period without its immediate requirement being meetable. Any tracking back of dependencies could be potentially expensive here.

In some respects this relates to #1540 though have seen it more recently with restarts following a warm start or where a restart has resulted in changed graphing.

@arjclark arjclark self-assigned this Aug 28, 2015
@arjclark arjclark added this to the soon milestone Aug 28, 2015
@matthewrmshin
Copy link
Contributor

A distant relative to this issue is #1286.

@arjclark
Copy link
Contributor Author

Of relevence, looks like Hilary suggested something similar in #1286 (comment) on that thread:

If no tasks are submitted or running, examine all 'waiting' tasks. A waiting task is ok if all of its unsatisfied prerequisites are on other waiting task proxies that either exist already or have an earlier instance that exists and will spawn into the right cycle point. But if any task depends on success of a task that has failed (or on failure of a task that has succeeded, etc.) or has an unsatisfied prerequisite on a task that does not exist and has no predecessor that exists and will spawn into the right cycle, then the suite is stalled (assuming that nothing is submitted or running at this time).

@arjclark
Copy link
Contributor Author

I'll look into this on my return from holiday.

@hjoliver
Copy link
Member

In some respects this relates to #1540 ...

It seems to me, that if a suite does not rely on ignoring pre-initial triggers - #1603 - any combination of restart and warm start will work correctly, except when a manual intervention results in a task whose dependencies can't be met (e.g. one or more tasks was cylc removed from the suite). And in a manual intervention users should be vigilant for this sort of problem. So, if we deprecate ignoring of pre-initial triggers as I'm advocating, would it be fair to say this issue will be much less of a problem? i.e. although we could potentially mitigate the effects of user error in removing tasks etc., it is only the problems with warm starting etc. in the presence of pre-initial trigger ignoring that is technically a bug in cylc.

@arjclark
Copy link
Contributor Author

It seems to me, that if a suite does not rely on ignoring pre-initial triggers - #1603 - any combination of restart and warm start will work correctly

Yeah #1603 would address most of this. The remaining situation would be where a restart or reload introduces a new task and dependencies on it but the operator forgets/doesn't know to insert the new task into the pool - #774 should prevent this though if it's possible. The other case would be where a new dependency on an existing task is added but the task has already cycled out of the pool prior to the reload.

@arjclark arjclark modified the milestones: soon, later May 26, 2016
@arjclark
Copy link
Contributor Author

With the introduction of stalled handlers in #1848 I think a number of the problems that result in a need for this are addressed i.e. if you're worried about something getting stuck you can pop in a stalled handler to fire off it it does, at which point you can then go in and inspect things for yourself.

I'm sort of struggling how we could do this in a non compute intensive way. Perhaps, something that could be tagged on to the new stalled handler functionality is an inspection of (non clock trigger dependent) tasks that are still in a waiting state and reporting any unmet prereqs in the error log as warnings. Maybe something like this would be the thing to do:

# suite.rc
[cylc]
[[event hooks]]
report unmet prereqs on stall = True

resulting in:

# suite.err
2016-05-23T12:58:40Z WARNING - suite stalled with following unmet prereqs:
 * foo.20100101T0000Z - bar.20100101T0000Z:succeed
 * baz.20100101T0000Z - bar.20100101T0000Z:succeed

@hjoliver - thoughts?

@matthewrmshin
Copy link
Contributor

I think we should just report unmet prerequisites on stalled regardless. There is no need to do any computation, because by definition all waiting tasks stall when the suite stalls.

@matthewrmshin matthewrmshin modified the milestones: next release, later Jun 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants