Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression where unknown tasks were allowed to be stolen #5392

Merged
merged 2 commits into from
Oct 13, 2021

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Oct 5, 2021

This has been marked flaky in #3574. The problem is that the test is written on such a high level that it may be true regardless of whether we allow unknown functions or not. In particular, we're setting duration estimates for every task once they are in state processing. If we do not explicitly verify if a task is unknown, we cannot blacklist it from stealing.

@fjetter fjetter force-pushed the delete_tests_steal_unknown_functions branch from 85a0180 to b9d7ad7 Compare October 5, 2021 15:14
@crusaderky crusaderky changed the title Fix regression where unknown tasks where allowed to be stolen Fix regression where unknown tasks were allowed to be stolen Oct 6, 2021
@fjetter fjetter force-pushed the delete_tests_steal_unknown_functions branch from 359f3bb to 3dd94ac Compare October 6, 2021 16:51
Comment on lines -2752 to -2772
if s:
for tts in s:
if tts._processing_on is not None:
wws = tts._processing_on
comm: double = self.get_comm_cost(tts, wws)
old: double = wws._processing[tts]
new: double = avg_duration + comm
diff: double = new - old
wws._processing[tts] = new
wws._occupancy += diff
self._total_occupancy += diff
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regression was partially introduced since the occupancy update here would not put a previously unknown task into the stealing whitelist. I didn't wan tto add on top of this function and decided to go for a refactoring instead

steal = state.extensions.get("stealing")
if not steal:
return
if ws._occupancy > old * 1.3 or old > ws._occupancy * 1.3:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also cases where the occupancy actually significantly reduced but we might still want to reapply stealing. Until #5379 is fixed, this might increase the likelihood of that deadlock but we should allow deviations in both directions

@fjetter fjetter force-pushed the delete_tests_steal_unknown_functions branch from 68e2731 to 9db9d9a Compare October 7, 2021 09:15
@fjetter fjetter mentioned this pull request Oct 7, 2021
3 tasks
@fjetter
Copy link
Member Author

fjetter commented Oct 7, 2021

Test failures appear to be unrelated

@fjetter fjetter force-pushed the delete_tests_steal_unknown_functions branch from 15a14a5 to 7303aa2 Compare October 13, 2021 11:53
@fjetter
Copy link
Member Author

fjetter commented Oct 13, 2021

There are the known cythonization errors in #5406 and a hard timeout with an interpreter shutdown in a windows run. I'll go on and merge since I don't think either is related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants