Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't include self-references in dependency graph #458

Open
lognaturel opened this issue Jul 12, 2019 · 12 comments · Fixed by #523
Open

Don't include self-references in dependency graph #458

lognaturel opened this issue Jul 12, 2019 · 12 comments · Fixed by #523

Comments

@lognaturel
Copy link
Member

lognaturel commented Jul 12, 2019

All expressions that include references to form nodes are included in a dependency graph. This makes it possible to only recompute expressions when their dependencies are updated which is an important performance measure for forms with a lot of dependent logic. See the W3C XForms documentation, IDag and Triggerable.

Currently, when a calculation in a particular node refers to that same node, that relationship is captured by the dependency graph. This blocks usage of self-referencing expressions because the dependency graph then has a cycle in it.

The W3C XForms spec says the following:

The depList for a vertex v is assigned to be the vertices other than v whose computational expressions reference v (described below). Vertex v is excluded from its own depList to allow self-references to occur without causing a circular reference exception.

See #440 and #441 for examples of useful self-referencing expressions.

I'm not exactly sure whether all expressions (calculate, relevant, etc) on a particular node are already recomputed when that node changes so that will be something to explore.

Related to #440, I believe that when expressions are in repeats or depend on values in repeats, the repeat path without any predicates is used in the dependency graph. So a change to any repeat instance will trigger recomputation. This is why computations in repeats are slow.

@lognaturel
Copy link
Member Author

@tiritea I hope this captures the underlying issue that #440 and #441 get at and the work that would need to be done to address them. If so, I propose we close those two issues in favor of this one.

@tiritea
Copy link
Contributor

tiritea commented Jul 14, 2019

I'm not sure... I opened 440 and 441 as separate issues to reflect two somewhat different behaviors, and because I wasn't sure if they'd necessarily be susceptible to the same solution. 441 reflects Validate correctly identifying a cyclic dependency/self reference - in coalesce(.,${foo}) - but which is actually legitimate (ie doesnt represent infinite recursion). Whereas 440 reflects Validate incorrectly identifying a cyclic dependency when in fact, due to the actual context in which the XPath expression will be evaluated - namely a specific indexed nodeset of a repeat group - it is not in fact cyclic.

I think excluding self- references may well address 441, but I'm not sure in the general case it will solve XPath expressions referencing into indexed repeat groups? (I still need to wrap my head around what it'll do in this case...)

@tiritea
Copy link
Contributor

tiritea commented Jul 14, 2019

How about we close 441 in favor of this one, and wait and see if it fixes 440 too?

@lognaturel
Copy link
Member Author

Sounds good to me! I'm pretty sure that with repeats, the unqualified (no predicates) path is used as the node in the dependency graph. In fact, I'm not sure that there's an alternative because predicates could include complex expressions. That means that for node N in a repeat, any expression including a reference to node N looks like a DAG cycle no matter what the predicate.

@tiritea
Copy link
Contributor

tiritea commented Jul 15, 2019

Does this 'fix' mean that something like . + 3 , or (not uncommon mistake) Q6: relevant = ${Q6} != '' will not be flagged as a error by Validate?

[I'm guessing probably 90% of the 'cyclic dependency' errors that Validate picks up are when the form writer inadvertently includes a self-reference in a relevant or calculate... So its actually nice if Validate could still flag these somehow, even as warnings]

@lognaturel
Copy link
Member Author

lognaturel commented Jul 16, 2019

Yes, following the W3C algorithm, all self-references are allowed.

I agree completely that it's rare for users to truly want self-references. In fact, I wouldn't be surprised if you're the first one! Of course that would change if once is deprecated. I don't think Validate can do the warnings but perhaps pyxform could.

@tiritea
Copy link
Contributor

tiritea commented Jul 16, 2019

Sorry to be the fly in the Validate ointment... ;-)

So if Validate will no longer detect these self-references at all - because DAG will start allowing them - and it is quite possible that Validate's current type static checking will also have to be backed off somewhat (I cant remember what that bug # is...) (#405) that may significantly reduce what logical (ie non-syntax) error checks Validate is actually doing... :-/ So I agree probably needs discussion since its a not insignificant change to Validate's behavior (even if it is strictly correct spec-wise).

@tiritea
Copy link
Contributor

tiritea commented Jul 16, 2019

Note, I think @yanokwa may have added this (ie Validate) as possible agenda topic for upcoming convening. So if we cant "sort it out amongst ourselves" here, we perhaps we can all have a, umm, 'healthy disputation' about Validate in person (over a beer, of course...) :-)

My preference: Validate throws a specific warning when it detects self references (ala status quo) because these may well be indicative of a user/form design error that could potentially result in re-calculations that will never quiesce... [but strictly speaking its impossible infeasible to definitively determine at the time whether they will or not]. I think such warnings are probably ok within a UI (eg pyxform, XLSForm online/offline validation, Kobo preview, ...) but they may not be possible where javaRosa's DAG check is currently being exploited as an effective 'gatekeeper' (specifcially ODK Collect, ODK Aggregate upload). eg I'm not sure how (or if?) in ODK Collect you want to throw a warning of the ilk "This form may contain bad calculations. Are you sure you want to continue?"... Nor how you can throw a 'warning' in Aggregate (Central?) when you try to upload an XML form with potentially bad self-references...

:-\

@tiritea
Copy link
Contributor

tiritea commented Jul 16, 2019

[not sure how to tap @MartijnR in github, but I'd like his view on all this...]

@tiritea
Copy link
Contributor

tiritea commented Jul 22, 2019

So the W3C spec is pretty cut-n-dry on this... :-| So do you think its still productive to discuss this change further here, or is it really just time to raise it to TSC level for discussion as to wider impacts, and how we might mitigate them?

I do feel this it the 'right' way forward - and 2 wrongs (precluding self references + once) dont make a right - and we should instead think of perhaps how we might better warn users of potentially 'bad' (eg non-aquiescing) calculations. But should we push on, or wait on making any change before having an reasonable alternative?

I also dont know what additional testing might now be necessary to ensure Collect behaves nicely should it get loaded with a self-referencing, non-aquiescing form. I believe Enketo forgoes DAG in favor of re-running calculations a limited number of times (3?) before giving up if they haven't aquiesced yet... That may become necessary to do in Collect?

@MartijnR
Copy link

In Enketo Validate we chose to completely remove self-referencing checks as well, after realizing we couldn't do them well enough and that it was therefore better not to do them at all.

I believe Enketo forgoes DAG in favor of re-running calculations a limited number of times (3?) before giving up if they haven't aquiesced yet... That may become necessary to do in Collect?

It's actually browsers themselves (at least Chrome for sure) that terminate infinite loops. There is no code in Enketo that keeps track of this.

@ggalmazor
Copy link
Contributor

ggalmazor commented Dec 20, 2019

Just a refresher on latest work on this front:

We have improved the test suite around the DAG, which includes specific cases of cycles in the DAG due to self-references and chains of dependencies between vertices in the DAG (triggerables).

We had to come up with specific scenarios to describe the DAGs behavior, which surfaced some inconsistencies that we should discuss:

  • We found that backward references between repeat group siblings work (i.e. depending on some value in the previous repeat instance), but forward references don't (i.e. depending on the next repeat instance).

    An exception to this would involve combining backward reference and a self-reference. A real-world use case for this would be an autoincremental field, as in:

    <bind nodeset="/data/group/sequence" type="int" calculate="/data/group[position() = (position(current()) - 1)]/sequence + 1"/>

    This is not supported currently because it's interpreted as a self-reference DAG cycle.

    Even though it's hard to come up with a real-world use for forward references between repeat group siblings, we should discuss if this is something we should support for completeness sake.

  • We were able to show in a test how a relevance condition in a group could affect to calculate expressions in fields inside the group. (See TriggerableDagTest.verify_relation_between_calculate_expressions_and_relevancy_conditions)

    @tiritea shared some links to the specs that explain how relevance should work in terms of calculate expressions at Describe and document which expressions are evaluated when #516 (comment)

    In our test, we have a field, and a group with two fields, both with calculate expressions. The first field inside the group depends on the field outside of it. The second field inside the group depends on the first field inside the group. The group itself has a relevance condition. We found that the first field in the group will always get its value calculated, but the second won't until the group is relevant.

  • We established that constraint conditions are not part of the DAG. The FormController evaluates them each time a question is answered. (See FormEntryController.answerQuestion)

  • We tried to define scenarios where a cycle could be defined between relevant, required, or readonly conditions and arrived at some interesting insights:

    • Lacking of xpath functions such as isRelevant(node), isRequired(node), and isReadOnly(node), it's impossible to create cycles using these conditions

    • It's not that self-references create cycles and, thus, must be rejected. It's that using self-references in these conditions doesn't make sense.

      To reject them, I'm more inclined to preemptively detect this and produce an exception even before the DAG gets built because it's less misleading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants