Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel cylc trigger edit problem #3578

Closed
dpmatthews opened this issue Apr 21, 2020 · 3 comments
Closed

Parallel cylc trigger edit problem #3578

dpmatthews opened this issue Apr 21, 2020 · 3 comments

Comments

@dpmatthews
Copy link
Contributor

Cylc can get in a mess if you do 2 trigger edits in parallel.
This is repeatable using the following trivial workflow:

[scheduling]
    [[dependencies]]
        graph = root
[runtime]
    [[hello]]
        script = "sleep 20; exit 1"
        [[[remote]]]
            host = cylcdev
  1. Run the workflow & let task hello fail.
  2. Do a Trigger (edit run) on task hello - save the file but leave the "Trigger edited task hello.1" prompt alone.
  3. Do another Trigger (edit run) on task hello - save the file and say yes to the "Trigger edited task hello.1" prompt.
  4. Once the task is running say no to the original "Trigger edited task hello.1" prompt.

This results in task hello stuck in the running state.
Relevant entries from the suite log:

2020-04-21T12:35:32+01:00 INFO - Command succeeded: dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:32+01:00 INFO - Processing 1 queued command(s)
	+	dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:45+01:00 INFO - Command succeeded: dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:45+01:00 INFO - Processing 1 queued command(s)
	+	dry_run_tasks([u'hello.1'], check_syntax=False)
2020-04-21T12:35:51+01:00 INFO - Command succeeded: trigger_tasks(['hello.1'], back_out=False)
2020-04-21T12:35:51+01:00 INFO - Processing 1 queued command(s)
	+	trigger_tasks(['hello.1'], back_out=False)
2020-04-21T12:35:51+01:00 INFO - [hello.1] -submit-num=03, owner@host=cylcdev
2020-04-21T12:35:53+01:00 INFO - [hello.1] status=ready: (internal)submitted at 2020-04-21T12:35:52+01:00 for job(03)
2020-04-21T12:35:54+01:00 INFO - [hello.1] status=submitted: (received)started at 2020-04-21T12:35:53+01:00 for job(03)
2020-04-21T12:35:58+01:00 INFO - Command succeeded: trigger_tasks(['hello.1'], back_out=True)
2020-04-21T12:35:58+01:00 INFO - Processing 1 queued command(s)
	+	trigger_tasks(['hello.1'], back_out=True)
2020-04-21T12:36:15+01:00 WARNING - [hello.1] status=running: (received-ignored)failed/EXIT at 2020-04-21T12:36:13+01:00 for job(03) != current job(02)

The back-out of the first trigger edit results in cylc thinking the running job is submit number 2 rather than 3.

Note that you have to be running the task on a remote host so that there are no log files being written locally - otherwise the cylc trigger can't remove the log directory and the submit number doesn't get changed.

The back_out functionality was introduced in #2461.
Tested with cylc 7.8.4.

@dpmatthews dpmatthews added the bug label Apr 21, 2020
@dpmatthews dpmatthews added this to the cylc-7.8.x milestone Apr 21, 2020
@oliver-sanders
Copy link
Member

Could solve by sending the "current" submission number with all trigger requests (would also prevent accidental re-triggering by out-of-date client).

@oliver-sanders
Copy link
Member

Trigger edit is going to have to be completely re-written for Cylc8 anyway, I would suggest we leave this as a documented "known bug" for Cylc7.

@oliver-sanders
Copy link
Member

Note: This problem has not been fixed at Cylc 7, however, is no longer present at Cylc 8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants