Make job scheduler local to task process #674

NajmudheenCT · 2021-08-23T13:16:01Z

What this PR does / why we need it:
Enhance/Change Performance metrics collection framework to schedule jobs locally in all worker nodes.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #670

Special notes for your reviewer:

Release note:

ThisIsClark · 2021-08-24T07:08:53Z

delfin/leader_election/distributor/telemetry_failed_job_distributor.py

+LOG = log.getLogger(__name__)
+
+
+class FailedTelemetryJob(object):


IMO, TelemetryFailedJob is better than FailedTelemetryJob

done. made it TaskDistributor

ThisIsClark · 2021-08-24T12:16:18Z

I think we should remove the word Telemetry from telemetry_failed_job_distributor.py and telemetry_job_distributor.py(both the file name and the class name), because we would notify the task creation/deletion/modification to job distributor, so they do not to 'telemetry' any more

codecov · 2021-08-25T13:12:50Z

Codecov Report

Merging #674 (7227911) into perf_coll_fw_enhance (baa386e) will increase coverage by 0.02%.
The diff coverage is 71.95%.

@@                   Coverage Diff                    @@
##           perf_coll_fw_enhance     #674      +/-   ##
========================================================
+ Coverage                 70.15%   70.18%   +0.02%     
========================================================
  Files                       156      159       +3     
  Lines                     14801    14936     +135     
  Branches                   1822     1822              
========================================================
+ Hits                      10384    10483      +99     
- Misses                     3816     3846      +30     
- Partials                    601      607       +6

Impacted Files	Coverage Δ
delfin/cmd/task.py	`0.00% <0.00%> (ø)`
delfin/leader_election/factory.py	`43.75% <0.00%> (-2.92%)`	⬇️
delfin/task_manager/manager.py	`0.00% <ø> (ø)`
delfin/task_manager/metrics_manager.py	`0.00% <0.00%> (ø)`
delfin/task_manager/metrics_rpcapi.py	`70.00% <70.00%> (ø)`
...ager/scheduler/schedulers/telemetry/job_handler.py	`76.87% <76.87%> (ø)`
...er_election/distributor/failed_task_distributor.py	`84.84% <84.84%> (ø)`
...in/leader_election/distributor/task_distributor.py	`90.00% <90.00%> (ø)`
delfin/db/sqlalchemy/models.py	`99.66% <100.00%> (+<0.01%)`	⬆️
delfin/task_manager/scheduler/schedule_manager.py	`56.52% <100.00%> (+0.96%)`	⬆️
... and 12 more

sushanthakumar · 2021-08-25T17:41:10Z

delfin/leader_election/distributor/telemetry_failed_job_distributor.py

+                         '%s' % job['id'])
+                self.task_rpcapi.assign_failed_job(self.ctx, job)
+
+                LOG.debug('Assigned failed task for  id: '


I think line 58 can be debug and 62 can be info

done, this code might change according distributor implementation, currently it is a pool based distributor

sushanthakumar · 2021-08-25T17:46:02Z

delfin/task_manager/metrics_rpcapi.py

+        return call_context.cast(context, 'remove_job',
+                                 job=job)
+
+    def assign_failed_job(self, context, job):


Can we have just 2 apis assign and remove with job names as they differ only by job names

Actually the handlers for both messages are different , if we make it same we need to pass on more argument to switch between types. since it is distribution over n/w we want to reduce message size, currently we use only task_id as parameter

sushanthakumar · 2021-08-25T17:55:47Z

delfin/task_manager/scheduler/schedulers/telemetry/job_handler.py

+                                'last_run_time': last_run_time}
+            db.task_update(self.ctx, self.task_id, update_task_dict)
+            LOG.info('Periodic collection tasks scheduled for for job id: '
+                     '%s ' % self.task_id)


self.job_ids.add(job_id) need here right?

done , thanks

ThisIsClark · 2021-08-26T13:27:59Z

leader_election/distributor need a init to make it as a module

NajmudheenCT · 2021-08-26T13:32:15Z

leader_election/distributor need a init to make it as a module

its there !

ThisIsClark · 2021-08-26T13:42:47Z

leader_election/distributor need a init to make it as a module

its there !

I try to find that, but failed. Only test folder had __init__.py

NajmudheenCT · 2021-08-27T05:47:34Z

leader_election/distributor need a init to make it as a module

its there !

I try to find that, but failed. Only test folder had __init__.py

You are right.. missed in this location.. added now

ThisIsClark · 2021-08-27T06:31:19Z

delfin/leader_election/distributor/failed_task_distributor.py

+
+    def __call__(self):
+        """
+        :return:


Please remove the useless comments

ThisIsClark · 2021-08-27T06:32:57Z

delfin/task_manager/scheduler/schedulers/telemetry/job_handler.py

+    def schedule_job(self, task_id):
+
+        if self.stopped:
+            """If Job is stopped return immediately"""


Single line comment please use #

ThisIsClark · 2021-08-27T06:36:47Z

delfin/task_manager/metrics_manager.py

+# limitations under the License.
+"""
+
+**periodical task manager for metric collection tasks**


Comment style should be same with other files, such as metrics_rpcapi.py

ThisIsClark · 2021-08-27T06:37:00Z

delfin/task_manager/metrics_manager.py

+
+from delfin import manager
+from delfin.task_manager.scheduler import schedule_manager
+


No blank line here

ThisIsClark

LGTM

sushanthakumar · 2021-08-28T04:03:11Z

LGTM

* Make job scheduler local to task process (#674) * Make job scheduler local to task process * Notify distributor when a new task added (#678) * Remove db-scan for new task creation (#680) * Use consistent hash to manage the topic (#681) * Remove the periodically call from task distributor (#686) * Start one historic collection immediate when a job is rescheduled (#685) * Start one historic collection immediate when a job is rescheduled * Remove failed task distributor (#687) * Improving Failed job handling and telemetry job removal (#689) Co-authored-by: ThisIsClark <liuyuchibubao@gmail.com> Co-authored-by: Ashit Kumar <akopensrc@gmail.com>

Make job scheduler local to task process

fda091d

NajmudheenCT changed the base branch from master to perf_coll_fw_enhance August 24, 2021 03:45

ThisIsClark reviewed Aug 24, 2021

View reviewed changes

Adding UTs

e26291a

sushanthakumar reviewed Aug 25, 2021

View reviewed changes

NajmudheenCT added 3 commits August 26, 2021 11:47

Adress review comments

3b40fc6

Adding UT for job_distributor

29055ab

Adding UT for failed_job_distributor

a6743f8

NajmudheenCT changed the title ~~[WIP]Make job scheduler local to task process~~ Make job scheduler local to task process Aug 26, 2021

Adding init.py for distributor module

634ae05

ThisIsClark reviewed Aug 27, 2021

View reviewed changes

NajmudheenCT added 2 commits August 27, 2021 18:00

Adding Schedule_boot_job function to hanlde node restart usecase

84a9fd0

correcting comments formatting errors

7227911

ThisIsClark approved these changes Aug 28, 2021

View reviewed changes

NajmudheenCT merged commit 903138a into sodafoundation:perf_coll_fw_enhance Aug 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make job scheduler local to task process #674

Make job scheduler local to task process #674

NajmudheenCT commented Aug 23, 2021

ThisIsClark Aug 24, 2021

NajmudheenCT Aug 26, 2021

ThisIsClark commented Aug 24, 2021 •

edited

Loading

codecov bot commented Aug 25, 2021 •

edited

Loading

sushanthakumar Aug 25, 2021

NajmudheenCT Aug 26, 2021

sushanthakumar Aug 25, 2021

NajmudheenCT Aug 26, 2021

sushanthakumar Aug 26, 2021

sushanthakumar Aug 25, 2021

NajmudheenCT Aug 26, 2021

ThisIsClark commented Aug 26, 2021

NajmudheenCT commented Aug 26, 2021

ThisIsClark commented Aug 26, 2021

NajmudheenCT commented Aug 27, 2021

ThisIsClark Aug 27, 2021

NajmudheenCT Aug 27, 2021

ThisIsClark Aug 27, 2021

NajmudheenCT Aug 27, 2021

ThisIsClark Aug 27, 2021

NajmudheenCT Aug 27, 2021

ThisIsClark Aug 27, 2021

NajmudheenCT Aug 27, 2021

ThisIsClark left a comment

sushanthakumar commented Aug 28, 2021

		LOG = log.getLogger(__name__)


		class FailedTelemetryJob(object):


		from delfin import manager
		from delfin.task_manager.scheduler import schedule_manager

Make job scheduler local to task process #674

Make job scheduler local to task process #674

Conversation

NajmudheenCT commented Aug 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThisIsClark commented Aug 24, 2021 • edited Loading

codecov bot commented Aug 25, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThisIsClark commented Aug 26, 2021

NajmudheenCT commented Aug 26, 2021

ThisIsClark commented Aug 26, 2021

NajmudheenCT commented Aug 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThisIsClark left a comment

Choose a reason for hiding this comment

sushanthakumar commented Aug 28, 2021

ThisIsClark commented Aug 24, 2021 •

edited

Loading

codecov bot commented Aug 25, 2021 •

edited

Loading