graph parsing: space removal bug #2534

hjoliver · 2018-01-09T10:02:54Z

This:

foo => bar baz

should be an error, but it is parsed as equivalent to:

foo => barbaz

Not likely to cause a problem (so long as strict validation, or rose suite-run, is used) because barbaz would have to be defined under [runtime] ... but still a bug.

The text was updated successfully, but these errors were encountered:

matthewrmshin · 2018-01-09T10:27:46Z

This is where the problem is: https://github.com/cylc/cylc/blob/4e985c3dda52540fa2c2c5cc32accb65373f6b60/lib/cylc/graph_parser.py#L166-L167

hjoliver · 2018-01-09T10:32:49Z

Yes, for this reason:
https://github.com/cylc/cylc/blob/4e985c3dda52540fa2c2c5cc32accb65373f6b60/lib/cylc/graph_parser.py#L153-L155
(which sounds good in most respects, but it might be nasty to fix)

matthewrmshin · 2018-01-26T10:12:44Z

@hjoliver I am unable to think of a quick fix for this one. Are you able to do so? If not, I am tempted to drop this to a later milestone.

In the longer term, I'd like to take a good look at our parsers, and probably try to separate the logic into stages, e.g. by adding a lexical analyser for the raw input, before parsing the tokens into a data structure and for further processing.

hjoliver · 2018-01-26T10:35:57Z

Same, no quick fix, sorry. At least users are unlikely to hit this bug (for the reason given in the description above).

ColemanTom · 2018-01-31T03:21:36Z

I'm not going to explore this properly, but, could you just do something like:

    if not re.match("^((?!\w+\s+\w+).)*$", line):
        Throw an exception or spit out a warning or however you want to handle it
    # Apparently this is the fastest way to strip all whitespace!: 
    line = "".join(line.split())

I don't know if any extra characters would need to be explored there (or how you would deal with something like

foo => bar \
    baz

which I think would have to rely on strict validation to catch unles you've joined lines in advance?

hjoliver · 2018-01-31T04:40:37Z

@ColemanTom - thanks for the suggestion, we'll look it at. We don't need to consider line continuation markers - those lines are concatenated first up when the file is read in.

matthewrmshin · 2018-01-31T08:59:24Z

Maybe I am over-thinking things here, but the issue is a bit more complicated because task names can contain non-alphanumeric characters. (My fault, I suppose. Should have left things simple.)

ColemanTom · 2018-01-31T21:41:41Z

Ok. I assume there is a list of unacceptable characters? e.g. = > | & ( ) :?

So ^((?!\w+\s+\w+).)*$ could be ^((?![^=>|&():]+\s+[^=>|&():]+).)*$ adding any other unacceptable characters in? It feels like it should be that simple (unless of course there is a whole heap of extra complexity in the namespace definition/acceptance).

Tried it on

foo => bar baz #failed
foo => bar & baz #passed
foo => ba3$r& baz@!^ #passed
foo => (ba3$r baz@!^) #failed
foo => (ba3$r| baz@!^) #passed

matthewrmshin · 2018-02-01T08:22:50Z

The good set of characters for a task name can be found in this module:
https://github.com/cylc/cylc/blob/master/lib/cylc/task_id.py

The problem is that we also have a lot of grammar in a graph node. E.g. We can have cycle offsets, parameters and trigger qualifiers. (The most problematic being the hyphen character -, which can be used in task names and parameters values, but is also used as the minus sign in a cycle offset, and in trigger qualifiers such as :finish-all.)

ColemanTom · 2018-02-02T02:32:58Z

This may be a touch crude, but:
\A[\s(]*\s*\w[\w\-+@%]*(\s*<[^>]+>\s*)?(\s*\[[^\]]+\]\s*)?(\s*:\s*\w[\s\w\-+]+)?(\s*[&|]\s*\w[\w\-+@%]*(\s*<[^>]+>\s*)?(\s*\[[^\]]+\]\s*)?(\s*:\s*\w[\s\w\-+]+)?[\s)]*)*(\s*=\s*>\s*[\s(]*\s*!?\s*\w[\w\-+@%]*(\s*<[^>]+>\s*)?(\s*\[[^\]]+\]\s*)?(\s*:\s*\w[\s\w\-+]+)?(\s*[&|]\s*!?\s*\w[\w\-+@%]*(\s*<[^>]+>\s*)?(\s*\[[^\]]+\]\s*)?(\s*:\s*\w[\s\w\-+]+)?[\s)]*)*)*\Z

Or, a bit easier to follow...

Construct the regex string

import re
taskname_pattern = "\s*\w[\w\-+@%]*"
task_suffix_options = "(\s*<[^>]+>\s*)?(\s*\[[^\]]+\]\s*)?(\s*:\s*\w[\s\w\-+]+)?"
lhs_task_pattern =  standard_pattern + task_suffix_options
rhs_task_pattern = "\s*!?" + lhs_task_pattern
multiple_task_pattern = "[\s(]*{pattern}(\s*[&|]{pattern}[\s)]*)*"
lhs_multiple_task_pattern = multiple_task_pattern.format(pattern=lhs_task_pattern)
rhs_multiple_task_pattern = multiple_task_pattern.format(pattern=rhs_task_pattern)
final_pattern = "\A" + lhs_multiple_task_pattern + "(\s*=\s*>\s*" + rhs_multiple_task_pattern + ")*\Z"
checker = re.compile(final_pattern)

Functions used to test:

def check_pass():
  for p in pass_list:
      if not checker.match(p):
          print("Pass did not pass...:  " + p)

def check_fail():
  for f in fail_list:
      if checker.match(f):
          print("Fail should not have passed...:  " + f)

Syntax that you would expect to fail:

fails = """    foo => !
    foo => @
    foo => +
    foo => %
    foo => @b
    foo => +b
    foo => %b
    foo => bar baz
    get<run pets> => feed<run,pets>
    get<run,pets> => feed<run pets>
    OBS_GET<run>: => OBS_PROC<run>
    OBS_GET<run>:succeed-all => OBS_PROC<>
    foo[] => foo"""

Syntax you would expect to pass

passes = """    pre => init<run>
    pre => init< run>
    pre => init<run >
    get<run,pets> => feed<run,pets>
    get< run,pets> => feed<run,pets>
    get<run, pets> => feed<run,pets>
    get<run,pets > => feed<run,pets>
    get<run , pets> => feed<run,pets>
    get <run,pets> => feed<run,pets>
    get<run,pets-1> => get<run,pets>
    get<run,pets -1> => get<run,pets>
    get<run,pets- 1> => get<run,pets>
    get<run,pets - 1> => get<run,pets>
    get<run =1,pets=dogs> => boo
    get<run= 1,pets=dogs> => boo
    get<run = 1,pets=dogs> => boo
    get < run = 1 , pets = dogs > => boo
    get<run=1,pets=dogs>=>boo
    OBS_GET<run>:succeed-all => OBS_PROC<run>
    OBS_GET<run> :succeed-all => OBS_PROC<run>
    OBS_GET<run>: succeed-all => OBS_PROC<run>
    OBS_GET<run> : succeed-all => OBS_PROC<run>
    OBS_GET<run>:succeed -all => OBS_PROC<run>
    OBS_GET<run>:succeed- all => OBS_PROC<run>
    OBS_GET<run> : succeed - all => OBS_PROC<run>
    foo => bar
    foo => b-
    foo => b@
    foo => b+
    foo => b%
    foo => bar& baz
    foo => bar &baz
    foo => bar & baz
    foo => bar| baz
    foo => bar |baz
    foo => bar | baz
    model[-P1M] => model
    model[- P1M] => model
    model[-P1M ] => model
    model [ - P1M ] => model
    Analysis[+PT12H] => ObSensitivity
    Analysis[ +PT12H] => ObSensitivity
    Analysis[+ PT12H] => ObSensitivity
    Analysis[ + PT12H ] => ObSensitivity
    copy:expired => !proc
    copy:expired => ! proc
    foo
    foo => bar => test
    foo => _
    foo => bar_baz
    foo[-P2M]:out2 => baz
    foo[ - P2M ] : out2 => baz
    foo[ -P2M]:out2 => baz
    foo[- P2M]:out2 => baz
    foo[-P2M] :out2 => baz
    foo[-P2M]: out2 => baz
    foo[ -P2M ]:out2 => baz
    sim<r,m,c=2>[-P1Y] => sim<r,m,c=0> | bar<g,h,i=3>
    OBS_GET<run>:succeed - all&foo => OBS_PROC<run> => hello => test | boo"""

Results:

>>> check_fail()
Fail should not have passed...:      get<run pets> => feed<run,pets>
Fail should not have passed...:      get<run,pets> => feed<run pets>
>>> check_pass()
>>>

So, not quite perfectly capturing the expected failures because I didn't put super logic inside the <...> or [....] portions of a task regex to account for param space param. It wouldn't be too hard to do, but I don't fully know the syntax permissible for a parameter name for example, or the time stuff inside the [...]. Those probably get captured fine anyway because a parameter wouldn't exist that is expected, or a time syntax wouldn't be accepted.

I'm sure there are other syntax I haven't accounted for, but following the format/functions above, it would be easy to create a function to do this stuff and add more strings to a pytest framework.

I know this isn't a high priority thing, but its a good excuse for me to practice and improve my regex knowledge.

hjoliver · 2018-02-07T08:44:41Z

@ColemanTom - thanks for this. I haven't had time to look at it in detail, sorry, and am about to go away for a week. But we can probably use what you've done if you can accommodate those two failing cases (and if the massive regex doesn't create a perfomance problem... and maybe it won't, in context).

ColemanTom · 2018-02-08T07:50:31Z

Sure thing. Feel free to assign this Issue to me if you like and I'll fork/send a merge request when I find the time to finish this and adding tests. I am guessing here - https://github.com/cylc/cylc/tree/master/tests/validate ?

matthewrmshin · 2018-02-08T09:19:47Z

Hi @ColemanTom Please feel free to raise a pull request. Yes, please add a test under https://github.com/cylc/cylc/blob/master/tests/validate/ - e.g. https://github.com/cylc/cylc/blob/master/tests/validate/64-circular.t should give you a good starting point. (Tests are shell scripts that outputs results in the Test Anything Protocol.)

ColemanTom · 2018-02-19T22:35:24Z

I've done more exploration of this. I'm very disappointed with the regex ability in python. No (natural) atomic groups or possessive quantifiers which would have made things a bit cleaner. But, with some pseudo-atomic groups which Python can handle, it is quite quick.

I have discovered two bugs in my logic which I still need to fix. And some behaviour in cylc that I was not expecting. I am wondering if it is intentional and I just didn't realise, or its a quirk?

graph = """
    a => b
        => c
"""

I expected that would fail as c is not following anything, but it works and is treated as a => b => c. I could not see that as an example in the user guide. So I'm not entirely sure how to handle that. It implies that on a given line you don't need a LHS node, so I could make that LHS optional I guess, change a + to a *, not hard, but I wanted to make sure it is expected and reasonable behaviour. If it were only the line => c, and no line above, cylc aborts. So I am guessing it is expected?

ColemanTom · 2018-02-20T10:24:24Z

I can answer my own question, yes, so long as the => isn't the leading thing on the first line, it can be on subsequent lines. Noticed it in the code.

matthewrmshin · 2018-02-20T10:34:59Z

@ColemanTom Yes, this is deliberate - to allow users to easily manipulate continuation lines.

hjoliver · 2018-02-20T11:16:34Z

See #1900

hjoliver · 2018-02-20T11:17:45Z

(graph lines can get very long, but backslash continuation is unsafe - trailing whitespace breaks it).

ColemanTom · 2018-02-21T10:16:05Z

Another question, this time about triggering off of remote tasks: https://cylc.github.io/cylc/html/single/cug-html.html#12.26

It doesn't say explicitly in the docs, but the validation fails, so I'm assuming it cannot be applied directly with any other graph node suffix (e.g. time triggers, parameters, family triggers), they have to be completely independent? I just want to make sure my understanding of things is correct.

hjoliver · 2018-02-21T21:20:24Z

Thanks for pointing that out, I'll get the docs updated. Task state triggers do work, e.g. "poll-foo<other::foo:succeed>:fail => bar" (i.e. trigger task bar if polling of other::foo eventually fails). Technically, cycle point offsets should work with the suite-polling syntax, but they don't. However it doesn't matter because you can just put your polling task on the same cycle point sequence as the target task in the remote suite (which is recommended in the docs). Technically, family triggers and parameters should work with suite-polling syntax, but it would not be a sensible thing to do: it would imply all family member/parameter-expanded tasks separately doing exactly the same polling operation, which is unnecessary and inefficient. (Interestingly, these do not fail validation - the polling syntax comes first - but the suite-polling scripting does not get applied to the member tasks). So there's a couple of things here that should be done better, but probably not worth the effort because suite-state polling tasks will become redundant pretty soon as a result of #2423 (and for your purposes, no need to support family and parameter syntax next to suite-polling)

hjoliver added the bug Something is wrong :( label Jan 9, 2018

hjoliver added this to the soon milestone Jan 9, 2018

matthewrmshin changed the title ~~graph parsing bug~~ graph parsing: space removal bug Jan 18, 2018

matthewrmshin modified the milestones: soon, later Jan 26, 2018

This was referenced Jan 31, 2018

Add, subtract, or override semantics for repeated config items #1363

Open

Python API #1962

Open

hjoliver mentioned this issue Feb 22, 2018

Amend suite state polling task documentation. #2583

Merged

matthewrmshin mentioned this issue Mar 6, 2018

Fix ref by value of parameters with plus/minus #2596

Merged

hjoliver mentioned this issue Mar 8, 2018

Separation of functionality in cylc.config #2392

Open

matthewrmshin mentioned this issue Apr 18, 2018

WIP: Fix space in graph line string merge bug #2628

Closed

matthewrmshin modified the milestones: later, next release Apr 18, 2018

ColemanTom mentioned this issue Apr 30, 2018

Fix space in graph line string simpler approach #2644

Merged

hjoliver mentioned this issue May 1, 2018

Fix graph_parser validation for incorrect syntax order in a node #2646

Merged

hjoliver closed this as completed in #2644 May 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph parsing: space removal bug #2534

graph parsing: space removal bug #2534

hjoliver commented Jan 9, 2018

matthewrmshin commented Jan 9, 2018

hjoliver commented Jan 9, 2018 •

edited

Loading

matthewrmshin commented Jan 26, 2018

hjoliver commented Jan 26, 2018

ColemanTom commented Jan 31, 2018

hjoliver commented Jan 31, 2018

matthewrmshin commented Jan 31, 2018

ColemanTom commented Jan 31, 2018

matthewrmshin commented Feb 1, 2018

ColemanTom commented Feb 2, 2018

hjoliver commented Feb 7, 2018

ColemanTom commented Feb 8, 2018

matthewrmshin commented Feb 8, 2018

ColemanTom commented Feb 19, 2018

ColemanTom commented Feb 20, 2018

matthewrmshin commented Feb 20, 2018

hjoliver commented Feb 20, 2018

hjoliver commented Feb 20, 2018

ColemanTom commented Feb 21, 2018

hjoliver commented Feb 21, 2018 via email •

edited

Loading

graph parsing: space removal bug #2534

graph parsing: space removal bug #2534

Comments

hjoliver commented Jan 9, 2018

matthewrmshin commented Jan 9, 2018

hjoliver commented Jan 9, 2018 • edited Loading

matthewrmshin commented Jan 26, 2018

hjoliver commented Jan 26, 2018

ColemanTom commented Jan 31, 2018

hjoliver commented Jan 31, 2018

matthewrmshin commented Jan 31, 2018

ColemanTom commented Jan 31, 2018

matthewrmshin commented Feb 1, 2018

ColemanTom commented Feb 2, 2018

hjoliver commented Feb 7, 2018

ColemanTom commented Feb 8, 2018

matthewrmshin commented Feb 8, 2018

ColemanTom commented Feb 19, 2018

ColemanTom commented Feb 20, 2018

matthewrmshin commented Feb 20, 2018

hjoliver commented Feb 20, 2018

hjoliver commented Feb 20, 2018

ColemanTom commented Feb 21, 2018

hjoliver commented Feb 21, 2018 via email • edited Loading

hjoliver commented Jan 9, 2018 •

edited

Loading

hjoliver commented Feb 21, 2018 via email •

edited

Loading