Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpath: support for indexed-repeat function #195

Merged
merged 16 commits into from
Aug 28, 2024

Conversation

eyelidlessness
Copy link
Member

Replaces #150. The previous approach incorrectly contextualized relative arguments to their preceding result node-sets. This approach is corrected based on:

Thanks @lognaturel for helping me understand the expected behavior! This one was a doozy, relative to what I thought I understood going in.

A nice side effect of all this clarification is that I think the implementation itself is much simpler.

I did not decide to go the alternative route discussed in #150 comments (providing a mechanism for a function to act as a syntax alias or sugar). That would certainly be interesting, but I decided to go with this approach for a couple of reasons:

  1. The alternative approach would add conceptual abstractions we'd want to validate thoroughly, which this approach does not.
  2. I think we benefit from continuing to exercise the existing @getodk/xpath concepts and internals for as many use cases as they address. This keeps the base of functionality contained, especially as compelling motivations for refactoring mount1.

Footnotes

  1. Such as the oft-cited Decouple @odk/xpath API from XML/browser DOM #39

…e usage

As was done in the first pass on this functionality in #150
… and include notes on the error condition test which still fails.
… and eliminate glob imports for the bulk migrated ORXE tests (these were convenient during rapid dev on the xpath functionality, but now they just make ongoing maintenance a little extra annoying)
This is copied verbatim from 56df0ea#diff-ca14e9532b1b772a6f0453cf9ff885d0fdd7c088df02bd17bfc911f03f4e9cd4 where they were originally introduced. This demonstrates the previous, wrong, understanding of how arguments are expected to be contextualized. The tests will be updated in subsequent commit(s) to show the difference in behavior from #150.
Copy link

changeset-bot bot commented Aug 22, 2024

🦋 Changeset detected

Latest commit: bbbd2a2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@getodk/scenario Minor
@getodk/xpath Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@eyelidlessness
Copy link
Member Author

I want to call out the series of commits around unit-ish tests. These were carried over from #150 and adapted to match the corrected behavior. I don't think the tests themselves have as much value in this PR as they did in #150, but I included them anyway because:

  • The commits updating them tell a good story of how the behavior changed.

  • I still think it's good for consistency to have thorough coverage at the @getodk/xpath package level. The existing tests have been incredibly valuable for building confidence in everything higher in the Web Forms stack.

@eyelidlessness
Copy link
Member Author

I'm not sure who to tag to review this one! I think @lognaturel has a good idea what to expect from a behavior perspective. I expect @sadiqkhoja would benefit from getting a bit more insight into @getodk/xpath implementation aspects (and I expect that context will be beneficial for anticipated incoming changes).

* be** the error thrown by `@getodk/xpath` for missing function support,
* so we start with an expected failure throughout the suite.
*
* - Bind for `calc` is updated to reference `/data/calc`, as is clearly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I just get in a relative mindset, I guess. 😬 Will fix in JR.

Copy link
Member

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some opportunities to clean up some comments made stale by getodk/javarosa@9748bea and a couple questions. Generally this looks ready to go to me. 🎉

/**
* **PORTING NOTES**
*
* - Updated body reference to `/data/repeat2/inside2`, as seems the likely
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to fix.

for (const pair of pairs) {
const { position } = pair;
const repeats = pair.repeats.filter((repeat) => {
return repeatContextNode?.contains(repeat) !== false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means invalid parameter pairs will be silently ignored, right? It feels like it would be helpful for the issue to be surfaced in case it's a typo but that could come later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I can see why it would seem that way! The reasoning for this was also less clear to me reading it today than when I wrote it.

It's a special case for the first iteration of the loop. I hope 7742cb9 is clarifying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpful for the first iteration, yes! What happens if I do indexed-repeat(/some/repeat, /some/other_repeat, 3)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm about to wrap up for the day so I can test tomorrow to say for sure, but I believe it will produce an empty node-set/blank string.

Ah, and now I understand the question better!

The contains filters aren't meant (only) as a guard against invalid structural references. They are also a result filter to roughly achieve the semantics of the the non-shorthand's LocationPath Step equivalent—i.e. /data/subtree/leaf produces a context of the leaf nodes contained by (well, in this case children of) /data/subtree.

You can think of it as conceptually similar to how some custom CSS selector engines work: they'll often work right to left (well, hierarchically bottom up), matching the deepest part of the selector first and then working hierarchically up the selector to refine the result set from there. (As described here, it's usually a performance optimization technique; and it's one we might also want to think about someday)

But yes, it will also filter out invalid hierarchies as in your example, that's just not all it's doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks! How about tracking the unrelated reference case as part of future error work? It would be nice to give explicit feedback as some sort of error if possible.

return 0;
};

export const indexedRepeat = new NodeSetFunction(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be StringFunction, in the spec return type of indexedRepeat is string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, thanks for flagging this, I wondered about it briefly when reviewing but didn't come back to it. I've never seen attempted usage that results in a nodeset but it certainly could be useful. I'm not sure what Collect will do in that case. I think it would be fine as long as you use it in a context where a nodeset is allowed like count(indexed-repeat(/data/repeat1/repeat2/foo, /data/repeat1, 3)) so it might be the spec that needs updating. We should try something like that in Collect (I can later if no one beats me to it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I made a case for this in #150. I am open to changing it, but I do want to make the case for it again here.

My interpretation of the spec is that the function...

  1. is called in a node-set context (implicit, true of all XPath expressions)
  2. produces a new node-set context (intermediate result) for each repeatN/indexN pair [in containment depth order]
  3. resolves to a node-set for the arg target (this is the result produced by the spec example's non-shortcut equivalent)
  4. ultimately casts that resolved node-set to a string (suggested by spec language: "single node value")

Importantly: XPath semantics only has a concept of node-set value/result type, not a single node. Any single node result—whether returned by a function, or produced by a LocationPath Step—is a node-set with a context size of 1.

In this case, the production of a "single node value" occurs in step 4, where XPath casting semantics produces the string value of the first node in a node-set (regardless of its context size).

The tests clearly demonstrate that behavior: anywhere a string value is consumed, the expected string value is produced from the single (first) node value in the resulting node-set.

In @getodk/xpath this casting from node-set to string occurs further up the call stack, applied by intentionally isolated casting logic, and applied consistently:

  • to any XPath result
  • of any type
  • whether an intermediate result of a sub-expression, or the ultimate result produced by an evaluate call
  • whether the result is produced by a FunctionImplementation, any part of a LocationPath, or any other result from any other XPath syntax

Our isolation of casting logic is important. XPath casting semantics have a fair bit of nuance which can easily be lost or become inconsistent otherwise (and did earlier in implementation!). That isolation is also crucial to our ability to support some ODK XForms functionality, i.e. date / time / datetime data types.


In terms of spec conformance: the only way to deviate from the spec behavior is to intentionally consume the return value as a node-set, as in @lognaturel's count example, or by calling evaluate with any of the XPathResult types producing one or more nodes **and then explicitly accessing the evaluated result's node APIs, or by calling any of our explicitly node-returning evaluate* convenience methods; any of which are explicit and intentional. Doing so does go beyond the spec, but it does not conflict with the spec. This is the same reasoning I applied to implementing the function with a variadic signature (rather than restricting calls to a maximum of three repeatN/indexN pairs as in the spec).

Since the spec signature returns a string, there's no precedent for different behavior when an indexed-repeat return value is deliberately used as a node-set: you can't cast a string to a node-set.

The only conflict this interpretation could introduce with the spec is if we think it's more valuable to users to actively prevent the beyond-spec usage than to (quietly, apart from this discussion) permit it.

Copy link
Member

@lognaturel lognaturel Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, it feels fine to leave as-is to me. 👍 @sadiqkhoja?

Our isolation of casting logic is important.

Strong agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw the test below and the edited comment. Ok, seems like there's more to consider!

const targetNodes = evaluateArgumentToFilterableNodes(context, target);

// Filter only the target nodes contained by the deepest repeat context node.
return targetNodes.filter((targetNode) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find wouldn't be a better function here? We need to pick " a single node value "

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll refer to the casting details in my comment the other thread, with emphasis on how node-sets are cast to strings. This is also how XPath semantics addresses node-set values.

Anyway, as I said in that thread: I'm open to making this less capable if that's what we decide we really want.

But to be clear, a NodeSetFunction implementation can't return a single node, it must return Iterable<Node>. So if we decide that the less capable thing we're after is guaranteeing single node-ness (rather than the value-ness, which is already handled by casting up the stack), we'd still need to either rewrap the single node result in an array (empty if null) or convert this to a generator and yield the non-null result.

@sadiqkhoja
Copy link
Contributor

Following adhoc test fails:

it('adhoc', () => {
		const xml = html(
			head(
				title('indexed-repeat form'),
				model(
					mainInstance(
						t(
							'data id="indexed-repeat-form"',
							t('some-group',
								t('item', t('value', 'john')),
								t('item', t('value', 'jane')),
								t('item',
									t('subgroup',t('value', 'val1')),
									t('subgroup',t('value', 'val2'))),
								t('ctx'))
						)
					)
				)
			),
			body()
		).asXml();

		console.log(xml);

		testContext = createXFormsTestContext(xml, {
			getRootNode: (doc) => doc.querySelector('instance')!,
		});

		const contextNode = testContext.document.querySelector('ctx');

		const expression = `concat(indexed-repeat(//value, /data/some-group/item, 3))`;

		testContext.assertStringValue(expression, "val1", { contextNode });
	});

While this was more concise, it was less clear when the optional case applies.

This is an odd case where TypeScript doesn’t have a great way to model a let binding which **will ultimately** be assigned by a loop, but is undefined for the loop’s first iteration.

I would normally prefer to express this with either:

- an initial non-null value assigned to the let binding (but there isn’t an appropriate one in this case; we could lie and assign the root node, but that would be a lot less clear IMO)
- reduce (but I suspect that would make this implementation a little bit harder for most readers who don’t also have an affinity for reduce)
@eyelidlessness
Copy link
Member Author

eyelidlessness commented Aug 27, 2024

Following adhoc test fails: [...]

Unless I'm mistaken, this looks like the same feedback as the other two comments: whether or not to go beyond the spec and permit a node-set result.

If I am mistaken and there's a different issue here that I'm not seeing, please feel free to clarify so I can address it separately.

Edit: after giving this some more thought, I think this might be a compelling argument for enforcing the “single node” aspect at the function implementation level, rather than just relying on casting up the stack as currently implemented. At least as a cautious first step, leaving it open to spec expansion in the future. But it definitely is an example of a potential case where we might unintentionally introduce unexpected behavior.

I think (as @lognaturel suggested) it’ll be interesting to see what JavaRosa produces for this case. I’ll check that in the morning!

@sadiqkhoja
Copy link
Contributor

I tried the attached Form in Collect and Enketo, both are producing multiple nodes so this implementation matches the JR's behaviour.

indexed-repeat.xlsx

@lognaturel
Copy link
Member

both are producing multiple nodes

Not what I expected! That said, it makes sense given that they both essentially construct XPath path expressions and then evaluate them. Let’s update the spec. 👍

@eyelidlessness
Copy link
Member Author

Thanks for putting a form together @sadiqkhoja! I've added two Scenario tests specifically exercising this behavior, with notes there to capture the review discussion around this, and confirming that the test have the same behavior in JavaRosa.

I'm really glad we took the time to make sure this is the right call, and to compare the JavaRosa behavior. In hindsight, this is a good reminder (to me!) not to take spec language too literally, and to think really carefully about likely intent. It's also really encouraging that it's now so quick and easy to compare Web Forms and JavaRosa behavior, so we can go to JR as a source of truth when there's doubt about spec minutiae.

I agree we should update the spec language, and I'll also re-emphasize that it would be a great opportunity to refine the example, and maybe add more examples as well.

@eyelidlessness eyelidlessness merged commit ce0090d into main Aug 28, 2024
86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants