Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sign every compute task with run ID to correlate response #7463

Merged
merged 19 commits into from
Jan 17, 2023

Conversation

hendrikmakait
Copy link
Member

@hendrikmakait hendrikmakait commented Jan 9, 2023

Supersedes #7372

  • Tests added / passed
  • Passes pre-commit run --all-files

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       24 files  ±  0         24 suites  ±0   10h 18m 43s ⏱️ + 5m 18s
  3 315 tests +  1    3 208 ✔️ +  3     105 💤 ±0  2  - 2 
39 084 runs  +12  37 193 ✔️ +16  1 889 💤  - 2  2  - 2 

For more details on these failures, see this check.

Results for commit 4c4663c. ± Comparison against base commit 6dd3c70.

♻️ This comment has been updated with latest results.

@@ -536,6 +536,16 @@ async def release_all_futures():
(f3.key, "executing", "released", "cancelled", {}),
(f3.key, "cancelled", "fetch", "resumed", {}),
(f3.key, "resumed", "memory", "memory", {}),
(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resumed task should be rejected by the scheduler because its run ID is stale, which triggers the task to be released and recomputed.

@hendrikmakait
Copy link
Member Author

test_deadlock_cancelled_after_inflight_before_gather_from_worker and test_scheduler_story_stimulus_success seem related.

@hendrikmakait
Copy link
Member Author

From what I understand, failures are likely unrelated and just general CI flakiness.

@hendrikmakait hendrikmakait marked this pull request as ready for review January 11, 2023 10:47
@@ -4646,6 +4676,24 @@ def stimulus_task_finished(self, key=None, worker=None, stimulus_id=None, **kwar
"stimulus_id": stimulus_id,
}
]
elif ts.run_id != run_id:
Copy link
Member Author

@hendrikmakait hendrikmakait Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clauses in stimulus_task_finished could likely be improved, but that would also mean some breaking changes to the transition logic and should be done in a PR focussing on that.

distributed/scheduler.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants