-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixing walk order to resolve priority in multi-sink pipelines #120
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -165,9 +165,11 @@ | |
total_length += len(next_df) | ||
|
||
def enqueue_tasks(self): | ||
# Work through the graph in reverse order, submitting any tasks as | ||
# needed. Reverse order ensures we prefer to send tasks that are closer | ||
# to the end of the pipeline and only feed as necessary. | ||
# helper to make submission decision of a single task based on the batch | ||
# size, exhaustion conditions, and whether the implementation deems it | ||
# submittable. Returns flags (eligible, submitted) to indicate whether | ||
# it was eligible to be submitted based on input queue and batch size, | ||
# and whether it was actually submitted. | ||
def _handle_one_task(task, rank): | ||
eligible = submitted = False | ||
if len(task.data_in) == 0: | ||
|
@@ -179,34 +181,42 @@ | |
self.logger.debug(f"Enqueueing split for <{task.name}>[bs={batch_size}]") | ||
task.split_pending.appendleft(self.split_batch_submit(batch, batch_size)) | ||
|
||
while len(task.data_in) > 0: | ||
num_to_merge = deque_num_merge(task.data_in, batch_size) | ||
if num_to_merge == 0: | ||
# If the feed is terminated and there are no more tasks that | ||
# will feed to this one, submit everything | ||
if self.source_exhausted and self.task_exhausted(task): | ||
num_to_merge = len(task.data_in) | ||
else: | ||
break | ||
eligible = True | ||
if not self.task_submittable(task.task, rank): | ||
break | ||
merged = [task.data_in.pop().data for i in range(num_to_merge)] | ||
self.logger.debug(f"Enqueueing merged batches <{task.name}>[n={len(merged)};bs={batch_size}]") | ||
task.pending.appendleft(self.task_submit(task.task, merged)) | ||
task.counter += 1 | ||
submitted = True | ||
num_to_merge = deque_num_merge(task.data_in, batch_size) | ||
if num_to_merge == 0: | ||
# If the feed is terminated and there are no more tasks that | ||
# will feed to this one, submit everything | ||
if self.source_exhausted and self.task_exhausted(task): | ||
num_to_merge = len(task.data_in) | ||
else: | ||
return (eligible, submitted) | ||
eligible = True | ||
if not self.task_submittable(task.task, rank): | ||
return (eligible, submitted) | ||
|
||
merged = [task.data_in.pop().data for _ in range(num_to_merge)] | ||
self.logger.debug(f"Enqueueing merged batches <{task.name}>[n={len(merged)};bs={batch_size}]") | ||
task.pending.appendleft(self.task_submit(task.task, merged)) | ||
task.counter += 1 | ||
submitted = True | ||
return (eligible, submitted) | ||
|
||
# proceed through all non-source tasks, which will be handled separately | ||
# below due to the need to feed from generator. | ||
rank = 0 | ||
for task in self.stream_graph.walk_back(sort_key=lambda x: x.counter): | ||
if task in self.stream_graph.source_tasks: | ||
continue | ||
eligible, _ = _handle_one_task(task, rank) | ||
if eligible: # update rank of this task if it _could_ be done, whether or not it was | ||
rank += 1 | ||
# below due to the need to feed from generator. We walk backwards, | ||
# re-evaluating the sort order of tasks of same depth after each single | ||
# submission, implementing a kind of "fair" submission, while still | ||
# prioritizing tasks closer to the sink. | ||
submitted = True | ||
while submitted: | ||
rank = 0 | ||
submitted = False | ||
Comment on lines
+208
to
+211
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. below this is unchanged. This is here to implement fairness, so long as anything has been submitted, we keep walking the graph from top down but with new sort order. Once nothing is submitted, either no room or need sources. |
||
for task in self.stream_graph.walk_back(sort_key=lambda x: x.counter): | ||
if task in self.stream_graph.source_tasks: | ||
continue | ||
eligible, submitted = _handle_one_task(task, rank) | ||
if eligible: # update rank of this task if it _could_ be done, whether or not it was | ||
rank += 1 | ||
if submitted: | ||
break | ||
|
||
# Source as many inputs as can fit on source tasks. We prioritize flushing the | ||
# input queue and secondarily on number of invocations in case batch sizes differ. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi: the change in this block simply removes the for loop, submitting (if eligible) only a single invocation of task at a time (to enable the fairness re-evaluation). The
break
->return
makes the diff look like more than whitespace.