Improved yaml parsing by adding an extended parser subclass able to inline anchors #502
+232
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request proposes a workaround for the lack of being able validate yaml files using anchors. The solution is to simply inline all anchors.
What's the matter with the upstream YAMLParser?
The YAMLParser currently produces the JsonEvents representing String and sets an additional status that a consumer needs to retrieve calling an additional method. However when trying to perform a YAMLSchema validation the current approach is to use the existing JSONSchema validation code which doesn't know about this extra method. As a result validating a yaml-file using aliases and anchors is failing to validate due to lack of anchor support.
The backlog contains already some issue talking about the need of a bigger refactoring to fully support anchors and aliases.
What's is the workaround?
This pull request adds a YAMLParserExt class (I didn't had a good idea how to name it) and a factory to make use of it. When the events are requested the implementation remembers the events produced by yaml content that has an anchor. Later when a alias is found and the anchor exists, it simply returns the same events that were part of the anchored content. As a result the schema validator will see the document as if the anchored content was inlined.
What's the risk pulling this in?
Repeating the events might have some unknown side-effects such as that a document will appear larger in terms of produced events than it actually is or code expecting the events to be unique might not work. I also haven't looked into whether there might be issue determining the position (line & column number) of whatever produced the event in the original document. However it works for use to validate our yaml-files and using a separate class to implement this option doesn't break any existing code.
Some thoughts:
$ref
from JSON but I can imaging the yaml aliases can be positioned more flexible.In case you consider pulling this in, please let me know where I can find an example of a unit test, I am more than happy to provide one. The approach I have taken in our internal project was simply running the validation across some yaml documents containing anchors and checking the validation results.