Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure consistent Seriailized DAG hashing #42517

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Sep 26, 2024

The serialized DAG dictionary is not ordered correctly when creating hashes, and that causes inconsistent hashes, leading to
frequent update of the serialized DAG table.

Changes:

Implemented sorting for serialized DAG dictionaries and nested structures to ensure consistent and predictable serialization order for hashing. Using sort_keys in json.dumps is not enough to sort the nested structures in the serialized DAG.

Added serialize and deserialize methods for DagParam and ArgNotSet to allow for more structured serialization.

Updated serialize_template_field to handle objects that implement the serialize method. This was done because of DagParam and ArgNotSet in the template fields. Previously, it produced an object, but with this change, it now serialises to a consistent object.

@ephraimbuddy ephraimbuddy changed the title Ensure consistent Seriailized DAG hashing with deterministic serialization Ensure consistent Seriailized DAG hashing Sep 26, 2024
The serialized DAG dictionary is not ordered correctly when creating hashes, and that causes inconsistent hashes, leading to
frequent update of the serialized DAG table.

Changes:

Implemented sorting for serialized DAG dictionaries and nested structures to ensure consistent and predictable serialization order for hashing. Using `sort_keys` in `json.dumps` is not enough to sort the nested structures in the serialized DAG.

Added serialize and deserialize methods for DagParam and ArgNotSet to allow for more structured serialization.

Updated serialize_template_field to handle objects that implement the serialize method. This was done because of DagParam and ArgNotSet in the template fields. Previously, it produced an object, but with this change, it now serialises to a consistent object.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant