Ensure consistent Seriailized DAG hashing #42517
Draft
+66
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The serialized DAG dictionary is not ordered correctly when creating hashes, and that causes inconsistent hashes, leading to
frequent update of the serialized DAG table.
Changes:
Implemented sorting for serialized DAG dictionaries and nested structures to ensure consistent and predictable serialization order for hashing. Using
sort_keys
injson.dumps
is not enough to sort the nested structures in the serialized DAG.Added serialize and deserialize methods for DagParam and ArgNotSet to allow for more structured serialization.
Updated serialize_template_field to handle objects that implement the serialize method. This was done because of DagParam and ArgNotSet in the template fields. Previously, it produced an object, but with this change, it now serialises to a consistent object.