-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
graph parsing: space removal bug #2534
Comments
This is where the problem is: https://github.com/cylc/cylc/blob/4e985c3dda52540fa2c2c5cc32accb65373f6b60/lib/cylc/graph_parser.py#L166-L167 |
Yes, for this reason: |
@hjoliver I am unable to think of a quick fix for this one. Are you able to do so? If not, I am tempted to drop this to a later milestone. In the longer term, I'd like to take a good look at our parsers, and probably try to separate the logic into stages, e.g. by adding a lexical analyser for the raw input, before parsing the tokens into a data structure and for further processing. |
Same, no quick fix, sorry. At least users are unlikely to hit this bug (for the reason given in the description above). |
I'm not going to explore this properly, but, could you just do something like: if not re.match("^((?!\w+\s+\w+).)*$", line):
Throw an exception or spit out a warning or however you want to handle it
# Apparently this is the fastest way to strip all whitespace!:
line = "".join(line.split()) I don't know if any extra characters would need to be explored there (or how you would deal with something like foo => bar \
baz which I think would have to rely on strict validation to catch unles you've joined lines in advance? |
@ColemanTom - thanks for the suggestion, we'll look it at. We don't need to consider line continuation markers - those lines are concatenated first up when the file is read in. |
Maybe I am over-thinking things here, but the issue is a bit more complicated because task names can contain non-alphanumeric characters. (My fault, I suppose. Should have left things simple.) |
Ok. I assume there is a list of unacceptable characters? e.g. So Tried it on
|
The good set of characters for a task name can be found in this module: The problem is that we also have a lot of grammar in a graph node. E.g. We can have cycle offsets, parameters and trigger qualifiers. (The most problematic being the hyphen character |
This may be a touch crude, but: Or, a bit easier to follow... Construct the regex string import re
taskname_pattern = "\s*\w[\w\-+@%]*"
task_suffix_options = "(\s*<[^>]+>\s*)?(\s*\[[^\]]+\]\s*)?(\s*:\s*\w[\s\w\-+]+)?"
lhs_task_pattern = standard_pattern + task_suffix_options
rhs_task_pattern = "\s*!?" + lhs_task_pattern
multiple_task_pattern = "[\s(]*{pattern}(\s*[&|]{pattern}[\s)]*)*"
lhs_multiple_task_pattern = multiple_task_pattern.format(pattern=lhs_task_pattern)
rhs_multiple_task_pattern = multiple_task_pattern.format(pattern=rhs_task_pattern)
final_pattern = "\A" + lhs_multiple_task_pattern + "(\s*=\s*>\s*" + rhs_multiple_task_pattern + ")*\Z"
checker = re.compile(final_pattern) Functions used to test: def check_pass():
for p in pass_list:
if not checker.match(p):
print("Pass did not pass...: " + p)
def check_fail():
for f in fail_list:
if checker.match(f):
print("Fail should not have passed...: " + f) Syntax that you would expect to fail: fails = """ foo => !
foo => @
foo => +
foo => %
foo => @b
foo => +b
foo => %b
foo => bar baz
get<run pets> => feed<run,pets>
get<run,pets> => feed<run pets>
OBS_GET<run>: => OBS_PROC<run>
OBS_GET<run>:succeed-all => OBS_PROC<>
foo[] => foo""" Syntax you would expect to pass
Results: >>> check_fail()
Fail should not have passed...: get<run pets> => feed<run,pets>
Fail should not have passed...: get<run,pets> => feed<run pets>
>>> check_pass()
>>> So, not quite perfectly capturing the expected failures because I didn't put super logic inside the I'm sure there are other syntax I haven't accounted for, but following the format/functions above, it would be easy to create a function to do this stuff and add more strings to a pytest framework. I know this isn't a high priority thing, but its a good excuse for me to practice and improve my regex knowledge. |
@ColemanTom - thanks for this. I haven't had time to look at it in detail, sorry, and am about to go away for a week. But we can probably use what you've done if you can accommodate those two failing cases (and if the massive regex doesn't create a perfomance problem... and maybe it won't, in context). |
Sure thing. Feel free to assign this Issue to me if you like and I'll fork/send a merge request when I find the time to finish this and adding tests. I am guessing here - https://github.com/cylc/cylc/tree/master/tests/validate ? |
Hi @ColemanTom Please feel free to raise a pull request. Yes, please add a test under https://github.com/cylc/cylc/blob/master/tests/validate/ - e.g. https://github.com/cylc/cylc/blob/master/tests/validate/64-circular.t should give you a good starting point. (Tests are shell scripts that outputs results in the Test Anything Protocol.) |
I've done more exploration of this. I'm very disappointed with the regex ability in python. No (natural) atomic groups or possessive quantifiers which would have made things a bit cleaner. But, with some pseudo-atomic groups which Python can handle, it is quite quick. I have discovered two bugs in my logic which I still need to fix. And some behaviour in cylc that I was not expecting. I am wondering if it is intentional and I just didn't realise, or its a quirk? graph = """
a => b
=> c
""" I expected that would fail as c is not following anything, but it works and is treated as |
I can answer my own question, yes, so long as the |
@ColemanTom Yes, this is deliberate - to allow users to easily manipulate continuation lines. |
See #1900 |
(graph lines can get very long, but backslash continuation is unsafe - trailing whitespace breaks it). |
Another question, this time about triggering off of remote tasks: https://cylc.github.io/cylc/html/single/cug-html.html#12.26 It doesn't say explicitly in the docs, but the validation fails, so I'm assuming it cannot be applied directly with any other graph node suffix (e.g. time triggers, parameters, family triggers), they have to be completely independent? I just want to make sure my understanding of things is correct. |
Thanks for pointing that out, I'll get the docs updated.
Task state triggers do work, e.g. "poll-foo<other::foo:succeed>:fail => bar" (i.e. trigger task bar if polling of other::foo eventually fails).
Technically, cycle point offsets should work with the suite-polling syntax, but they don't. However it doesn't matter because you can just put your polling task on the same cycle point sequence as the target task in the remote suite (which is recommended in the docs).
Technically, family triggers and parameters should work with suite-polling syntax, but it would not be a sensible thing to do: it would imply all family member/parameter-expanded tasks separately doing exactly the same polling operation, which is unnecessary and inefficient. (Interestingly, these do not fail validation - the polling syntax comes first - but the suite-polling scripting does not get applied to the member tasks).
So there's a couple of things here that should be done better, but probably not worth the effort because suite-state polling tasks will become redundant pretty soon as a result of #2423
(and for your purposes, no need to support family and parameter syntax next to suite-polling)
|
This:
should be an error, but it is parsed as equivalent to:
Not likely to cause a problem (so long as strict validation, or rose suite-run, is used) because
barbaz
would have to be defined under[runtime]
... but still a bug.The text was updated successfully, but these errors were encountered: