-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make job scheduler local to task process #674
Make job scheduler local to task process #674
Conversation
LOG = log.getLogger(__name__) | ||
|
||
|
||
class FailedTelemetryJob(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, TelemetryFailedJob
is better than FailedTelemetryJob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. made it TaskDistributor
I think we should remove the word |
Codecov Report
@@ Coverage Diff @@
## perf_coll_fw_enhance #674 +/- ##
========================================================
+ Coverage 70.15% 70.18% +0.02%
========================================================
Files 156 159 +3
Lines 14801 14936 +135
Branches 1822 1822
========================================================
+ Hits 10384 10483 +99
- Misses 3816 3846 +30
- Partials 601 607 +6
|
'%s' % job['id']) | ||
self.task_rpcapi.assign_failed_job(self.ctx, job) | ||
|
||
LOG.debug('Assigned failed task for id: ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think line 58 can be debug and 62 can be info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, this code might change according distributor implementation, currently it is a pool based distributor
return call_context.cast(context, 'remove_job', | ||
job=job) | ||
|
||
def assign_failed_job(self, context, job): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have just 2 apis assign and remove with job names as they differ only by job names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the handlers for both messages are different , if we make it same we need to pass on more argument to switch between types. since it is distribution over n/w we want to reduce message size, currently we use only task_id as parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
'last_run_time': last_run_time} | ||
db.task_update(self.ctx, self.task_id, update_task_dict) | ||
LOG.info('Periodic collection tasks scheduled for for job id: ' | ||
'%s ' % self.task_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.job_ids.add(job_id) need here right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done , thanks
leader_election/distributor need a init to make it as a module |
its there ! |
|
||
def __call__(self): | ||
""" | ||
:return: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the useless comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
def schedule_job(self, task_id): | ||
|
||
if self.stopped: | ||
"""If Job is stopped return immediately""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single line comment please use #
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# limitations under the License. | ||
""" | ||
|
||
**periodical task manager for metric collection tasks** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment style should be same with other files, such as metrics_rpcapi.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
from delfin import manager | ||
from delfin.task_manager.scheduler import schedule_manager | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No blank line here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
* Make job scheduler local to task process (#674) * Make job scheduler local to task process * Notify distributor when a new task added (#678) * Remove db-scan for new task creation (#680) * Use consistent hash to manage the topic (#681) * Remove the periodically call from task distributor (#686) * Start one historic collection immediate when a job is rescheduled (#685) * Start one historic collection immediate when a job is rescheduled * Remove failed task distributor (#687) * Improving Failed job handling and telemetry job removal (#689) Co-authored-by: ThisIsClark <liuyuchibubao@gmail.com> Co-authored-by: Ashit Kumar <akopensrc@gmail.com>
What this PR does / why we need it:
Enhance/Change Performance metrics collection framework to schedule jobs locally in all worker nodes.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #670Special notes for your reviewer:
Release note: