Remove failed task distributor #687

ThisIsClark · 2021-09-06T14:12:03Z

What this PR does / why we need it:
Remove failed task distributor

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Release note:

codecov · 2021-09-06T14:14:07Z

Codecov Report

Merging #687 (5a889c2) into perf_coll_fw_enhance (cd62e0b) will decrease coverage by 0.04%.
The diff coverage is 57.14%.

@@                   Coverage Diff                    @@
##           perf_coll_fw_enhance     #687      +/-   ##
========================================================
- Coverage                 70.96%   70.92%   -0.05%     
========================================================
  Files                       161      160       -1     
  Lines                     15290    15240      -50     
  Branches                   1872     1868       -4     
========================================================
- Hits                      10851    10809      -42     
+ Misses                     3821     3814       -7     
+ Partials                    618      617       -1

Impacted Files	Coverage Δ
...in/leader_election/distributor/task_distributor.py	`85.18% <ø> (-0.53%)`	⬇️
delfin/task_manager/scheduler/schedule_manager.py	`83.82% <25.00%> (+12.05%)`	⬆️
...dulers/telemetry/performance_collection_handler.py	`100.00% <100.00%> (+3.70%)`	⬆️
delfin/task_manager/metrics_rpcapi.py	`50.00% <0.00%> (-14.71%)`	⬇️
delfin/context.py	`74.00% <0.00%> (-6.00%)`	⬇️
delfin/rpc.py	`73.68% <0.00%> (-2.64%)`	⬇️
delfin/db/sqlalchemy/api.py	`71.72% <0.00%> (-0.13%)`	⬇️

sushanthakumar · 2021-09-07T05:04:42Z

LGTM

NajmudheenCT · 2021-09-07T05:48:49Z

delfin/task_manager/scheduler/schedulers/telemetry/performance_collection_handler.py

@@ -108,3 +108,6 @@ def _handle_task_failure(self, start_time, end_time):
                       FailedTask.retry_count.name: 0,
                       FailedTask.executor.name: self.executor}
        db.failed_task_create(self.ctx, failed_task)
+        self.metric_task_rpcapi.assign_failed_job(self.ctx,


This message will help to assign failed job in same executor.. But with removal failed_job distributor scanning there no way to assign failed_job in case of a node down/restart .

Ok, so we should add another mechanism to recover the failed job which belongs to this executor

NajmudheenCT

LGTM

* Make job scheduler local to task process (#674) * Make job scheduler local to task process * Notify distributor when a new task added (#678) * Remove db-scan for new task creation (#680) * Use consistent hash to manage the topic (#681) * Remove the periodically call from task distributor (#686) * Start one historic collection immediate when a job is rescheduled (#685) * Start one historic collection immediate when a job is rescheduled * Remove failed task distributor (#687) * Improving Failed job handling and telemetry job removal (#689) Co-authored-by: ThisIsClark <liuyuchibubao@gmail.com> Co-authored-by: Ashit Kumar <akopensrc@gmail.com>

ThisIsClark force-pushed the remove_failed_job_distributor branch from d84cc70 to 6a4cc41 Compare September 7, 2021 02:46

Remove failed task distributor

5a889c2

ThisIsClark force-pushed the remove_failed_job_distributor branch from 6a4cc41 to 5a889c2 Compare September 7, 2021 03:20

NajmudheenCT reviewed Sep 7, 2021

View reviewed changes

NajmudheenCT approved these changes Sep 7, 2021

View reviewed changes

NajmudheenCT merged commit 38adb19 into sodafoundation:perf_coll_fw_enhance Sep 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove failed task distributor #687

Remove failed task distributor #687

ThisIsClark commented Sep 6, 2021

codecov bot commented Sep 6, 2021 •

edited

Loading

sushanthakumar commented Sep 7, 2021

NajmudheenCT Sep 7, 2021

ThisIsClark Sep 7, 2021

NajmudheenCT left a comment

Remove failed task distributor #687

Remove failed task distributor #687

Conversation

ThisIsClark commented Sep 6, 2021

codecov bot commented Sep 6, 2021 • edited Loading

Codecov Report

sushanthakumar commented Sep 7, 2021

NajmudheenCT Sep 7, 2021

Choose a reason for hiding this comment

ThisIsClark Sep 7, 2021

Choose a reason for hiding this comment

NajmudheenCT left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 6, 2021 •

edited

Loading