-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove failed task distributor #687
Remove failed task distributor #687
Conversation
Codecov Report
@@ Coverage Diff @@
## perf_coll_fw_enhance #687 +/- ##
========================================================
- Coverage 70.96% 70.92% -0.05%
========================================================
Files 161 160 -1
Lines 15290 15240 -50
Branches 1872 1868 -4
========================================================
- Hits 10851 10809 -42
+ Misses 3821 3814 -7
+ Partials 618 617 -1
|
d84cc70
to
6a4cc41
Compare
6a4cc41
to
5a889c2
Compare
LGTM |
@@ -108,3 +108,6 @@ def _handle_task_failure(self, start_time, end_time): | |||
FailedTask.retry_count.name: 0, | |||
FailedTask.executor.name: self.executor} | |||
db.failed_task_create(self.ctx, failed_task) | |||
self.metric_task_rpcapi.assign_failed_job(self.ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message will help to assign failed job in same executor.. But with removal failed_job distributor scanning there no way to assign failed_job in case of a node down/restart .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so we should add another mechanism to recover the failed job which belongs to this executor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Make job scheduler local to task process (#674) * Make job scheduler local to task process * Notify distributor when a new task added (#678) * Remove db-scan for new task creation (#680) * Use consistent hash to manage the topic (#681) * Remove the periodically call from task distributor (#686) * Start one historic collection immediate when a job is rescheduled (#685) * Start one historic collection immediate when a job is rescheduled * Remove failed task distributor (#687) * Improving Failed job handling and telemetry job removal (#689) Co-authored-by: ThisIsClark <liuyuchibubao@gmail.com> Co-authored-by: Ashit Kumar <akopensrc@gmail.com>
What this PR does / why we need it:
Remove failed task distributor
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer:
Release note: