-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event handler enhancements #1503
Conversation
(This is not ready yet, but it needs some initial sound bites.) |
5f29071
to
562beb1
Compare
# N.B. "scp" does not have a "max-size" option. | ||
check_call([ | ||
"rsync", "-a", "--rsh=" + ssh_tmpl, "--max-size=" + opts.max_size, | ||
source + "/", target]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to worry about this: #1494 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be OK with our normal naming conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can now have ":" in filenames: YYYY-MM-DDThh:mm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only use the basic format YYMMDDThhmm
under log/job/
, or don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
% find ~/cylc-run/bar/log/job/
/home/oliverh/cylc-run/bar/log/job/2015-08-08T00:00
/home/oliverh/cylc-run/bar/log/job/2015-08-08T00:00/foo
/home/oliverh/cylc-run/bar/log/job/2015-08-08T00:00/foo/NN
/home/oliverh/cylc-run/bar/log/job/2015-08-08T00:00/foo/01
/home/oliverh/cylc-run/bar/log/job/2015-08-08T00:00/foo/01/job.status
/home/oliverh/cylc-run/bar/log/job/2015-08-08T00:00/foo/01/job.err
/home/oliverh/cylc-run/bar/log/job/2015-08-08T00:00/foo/01/job.out
/home/oliverh/cylc-run/bar/log/job/2015-08-08T00:00/foo/01/job
/home/oliverh/cylc-run/bar/log/job/2015-08-08T00:00/foo/01/job-activity.log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you have something like [cylc]cycle point format
defined in your suite?
(The :
character would probably make PATH-like search rather interesting?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cycle point format = %Y-%m-%dT%H:%M
That's the official "extended format" with minutes under ISO 8601. Which is nice and readable. My point is we're not disallowing this, or transforming in log paths (maybe we should?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a test for this to ensure this is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test added. I think rsync
only has a problem if we have a colon right after a string that can be a valid URI scheme. It should not happen in our case here.
I get a traceback on shutdown:
|
'mail retry delays' : vdr( vtype='interval_minutes_list', default=[] ), | ||
'mail smtp' : vdr( vtype='string' ), | ||
'mail to' : vdr( vtype='string' ), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is like the original cylc event hooks config - one handler and list of events for it - which we changed because it doesn't allow different handlers for different events. Is this intended to replace the current system (just left in for back compat) or do you want to keep both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intend to keep both, so we can have the best of both worlds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add setting of default values for "handlers" and "handler events" as has been done recently for hooks under #1501
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in theory. However, this change is an add-on to the [runtime][TASK][event hooks]
section, which does not currently have the equivalent of #1501, so I am a bit reluctant to bundle this in as part of this change. I agree that we should have a follow on to implement #1501 functionality for [runtime][TASK][event hooks]
and [runtime][TASK][events]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised as #1529
All problems raised should be addressed. I'll document the proposed change and add some new tests next. |
Matt, is this now done as far as you're concerned? If so, I'll take another look (was distracted by zombie processes today, sorry). |
The functionality should be OK. I have yet to find time to add something the CUG and some automated tests for the new functionality. |
9c515cd
to
fc86552
Compare
Branch re-based. I have now added docs and some automated tests. |
@hjoliver this is now ready for review. |
(sorry for the delay - the authentication branch just absorbed another day ... I should get on to this tomorrow) |
Don't worry. It has taken me forever to get this branch done in the first place. |
Quick summary of what's new: [runtime]
[[root]]
script = true
[[[remote]]]
host = my-favourite-host
# Switch on job log retrieval for this family/task
retrieve job logs = True
# Specify initial + retry delays, default is to run immediately
retrieve job logs retry delays = PT5S, PT30S, PT1M
[[[events]]]
# Email on these events
mail events = submission failed, submission retry, failed, retry
# You can even modify "from:", "to:" and the SMTP server's "host:port"
# mail from = notifications@$HOSTNAME
# mail to = $USER
# mail smtp = localhost:25
# Can specify an initial + retry delays
# mail retry delays = PT0S
#
# Generic event handlers for specified events
handler = hello-event-handler '%(point)s' '%(name)s' '%(submit_num)s' '%(event)s'
handler events = succeeded, failed
# Can specify an initial + some retry delays
# This setting applies to specific event handlers as well.
# handler retry delays = PT0S Each event handler can be a template script or a command like before. For a template, the following sub-strings will be substituted:
(The classic interface is equivalent to |
(Just broken the tests slightly by my latest change. Will fix soon.) |
Branch re-based, broken tests fixed. |
The test battery passes in my environment. Although ... |
the suite host. | ||
although they can be retrieved by right-clicking on the task in the GUI. If you | ||
want the job logs pulled back to the suite host automatically, you can set | ||
\lstinline@[[[remote]]]retrieve job log=True@: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks funny when process because it is directly above a long-format code block. Maybe expand it more in text, "you can set 'retrieve job log = True' under the task [[[remote]]] section:"?
Long hyperlink fixed. Branch re-based. |
I have updated my what's new quick summary. |
@arjclark please review-2. |
Ensure that event handler entries go to the job-activity.log with the correct submit number.
@@ -173,9 +173,69 @@ class CylcSuiteDAO(object): | |||
MAX_TRIES = 100 | |||
TABLE_BROADCASTS = "broadcasts" | |||
TABLE_TASK_JOBS = "task_jobs" | |||
TABLE_TASK_JOB_LOGS = "task_job_logs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this new table going to pose a problem for restarts on older suites, or will it be created automatically if missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tables are created automatically if they are missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thought so but needed to confirm.
Getting an error from the test battery under
|
^ presumably accidentally broken by matthewrmshin@c015453 |
Test fixed. |
New and additional settings to
suite.rc
:Improve diagnostics to
job-activity.log
.Suite run time database, new
task_job_logs
table:rose suite-hook
rose_prune
androse suite-log --update
, and viewed by Rose Bush.Other re-factor and tidy up:
cylc.mp_pool
) in a single object.This closes #181, addresses bullet points 1 to 3 of #992, and begins the 1st step for #1052.