-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only restore jobIds added within last 3 days #983
Only restore jobIds added within last 3 days #983
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question here -- why do we need this if jobs are auto-archived? Won't archived jobs fail to load during the restore process and drop off the monitored list?
@@ -46,7 +47,8 @@ | |||
PLUGIN_DESCRIPTION = "Monitors a list of jobs" | |||
PLUGIN_PROVIDES = "MonitorJobsDockWidget" | |||
REGEX_EMPTY_STRING = re.compile("^$") | |||
|
|||
TIME_DELTA = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constant name is a bit generic IMO and it's generally preferred for the name to contain the time unit if it's a duration -- maybe something like JOB_RESTORE_THRESHOLD_DAYS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was suggesting to rename TIME_DELTA
-> JOB_RESTORE_THRESHOLD_DAYS
. I'm confused now what each of these do.
Based on the code it looks like these two lines should be:
# Maximum number of jobs that will be restored on CueGUI restart.
JOB_RESTORE_MAX_JOBS = 200
# Maximum age for a job to be restored on CueGUI restart.
JOB_RESTORE_THREADHOLD_DAYS = 3
Do I have that right?
""" | ||
today = datetime.datetime.now() | ||
limit = JOB_LOAD_LIMIT if len(jobIds) > JOB_LOAD_LIMIT else len(jobIds) | ||
msg = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For multiline strings I think we should use implicit joins instead of the triple quotes -- this adds extra whitespace to the log message, right?
Maybe something like:
msg = ('Unable to load previously loaded job since it was moved '
'to the historical database: {0}')
(jobs are moved to historical database) | ||
|
||
:param jobIds: monitored jobs ids and their timestamp from previous working state (loaded from config.ini file) | ||
:ptype: list of tuples (ex: [("Job.f156be87-987a-48b9-b9da-774cd58674a3", 1612482716.170947),... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I believe the format for argument types should be :type jobIds: list[tuple]
. Additional explanation, examples, etc. should be either on the :param
line or above in the description.
@bcipriano - to answer your two questions: Cuegui uses the Some more context to this is that there is another PR that I still need to add, is that there is a bug in the autoload on reopen, currently, opencue doesn't autoload jobs that are not owned by the user because the Although this a "fix" the real issue is that the restore state needs to be redesigned as well as not to prevent the app from launching if it can't load the jobids. As well, I am okay if we defer to this PR to creating another issue for redesigning the restore state. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for that, I get it now -- the extra load is just during startup, not during the entire runtime. I agree this makes sense as a stopgap fix.
Approved, but looks like there's a unit test failing.
Thanks Brian. |
Ping -- any assistance needed to resolve this test failure? |
1150174
to
86ad2b3
Compare
@bcipriano - I fixed the pylint errors and the pipeline now passes. |
Thanks! There are a couple of open comments in the code review, could you take a look at those please? |
@bcipriano - will do, apologies that I missed them previously! |
440ae37
to
9d995fb
Compare
cuegui/cuegui/JobMonitorTree.py
Outdated
@@ -226,23 +226,41 @@ def setLoadMine(self, value): | |||
@type value: boolean or QtCore.Qt.Checked or QtCore.Qt.Unchecked""" | |||
self.__loadMine = (value is True or value == QtCore.Qt.Checked) | |||
|
|||
def addJob(self, job): | |||
def addJob(self, job, timestamp=None): | |||
"""Adds a job to the list. With locking" | |||
@param job: Job can be None, a job object, or a job name. | |||
@type job: job, string, None""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a line to the docstring explaining what the new timestamp
param does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added timestamp to the docstring
@@ -46,7 +47,8 @@ | |||
PLUGIN_DESCRIPTION = "Monitors a list of jobs" | |||
PLUGIN_PROVIDES = "MonitorJobsDockWidget" | |||
REGEX_EMPTY_STRING = re.compile("^$") | |||
|
|||
TIME_DELTA = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was suggesting to rename TIME_DELTA
-> JOB_RESTORE_THRESHOLD_DAYS
. I'm confused now what each of these do.
Based on the code it looks like these two lines should be:
# Maximum number of jobs that will be restored on CueGUI restart.
JOB_RESTORE_MAX_JOBS = 200
# Maximum age for a job to be restored on CueGUI restart.
JOB_RESTORE_THREADHOLD_DAYS = 3
Do I have that right?
@bcipriano, sorry about that, yes my previous commit change was incorrect. I updated the constants, one for days and the other for limit. I hope this fixes the issue. |
@bcipriano I noticed that the build tests are failing on:
I think it might be related to: sphinx-doc/sphinx#9841 Thanks in advance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
Yeah, that documentation build failure is due to an upstream issue. Tracked by #1065, we'll get that fixed up soon.
#1065 is fixed now, I re-ran the checks in this PR but it's still failing. I think you'll just need to sync from |
Changed constant to more descriptive name, changed multi line text to use implicit joins, fixed arg type.
Previous commit mixed constant job threshold's limit with days, corrected it. Added comment description for timestamp in addJob.
d23b978
to
e313aff
Compare
@bcipriano - thank you! I rebased the code from master and all tests pass. |
Summarize your change.
Common complaint from our users is that if they added 1000s of jobs to their previous Job Monitor Tree session, opencuetopia would take a very long time to load, especially if the jobs were moved to the historical database since their last session (they would also receive many warnings in the shell which further bogs down the app).
To avoid the restore lag for such scenarios:
startTime()
because a job could be running on the farm for 3+ days or be dependent on other jobs, meaning a job could have a db start time > 3 (and hence not restored but could still be in progress and of interest to the user)