-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unify the image spec hash function #2593
Conversation
Signed-off-by: Kevin Su <pingsutw@apache.org>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2593 +/- ##
==========================================
+ Coverage 78.91% 81.43% +2.51%
==========================================
Files 316 281 -35
Lines 24965 23424 -1541
Branches 4012 4012
==========================================
- Hits 19702 19076 -626
+ Misses 4548 3653 -895
+ Partials 715 695 -20 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
@thomasjpfan mind taking a look again, thanks! |
flytekit/image_spec/image_spec.py
Outdated
image_spec_dict = asdict(spec, dict_factory=lambda x: {k: v for (k, v) in x if v is not None}) | ||
image_spec_bytes = image_spec_dict.__str__().encode("utf-8") | ||
tag = ( | ||
base64.urlsafe_b64encode(hashlib.md5(image_spec_bytes).digest()) | ||
.decode("ascii") | ||
.rstrip("=") | ||
.replace("-", "_") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works:
image_spec_dict = asdict(spec, dict_factory=lambda x: {k: v for (k, v) in x if v is not None}) | |
image_spec_bytes = image_spec_dict.__str__().encode("utf-8") | |
tag = ( | |
base64.urlsafe_b64encode(hashlib.md5(image_spec_bytes).digest()) | |
.decode("ascii") | |
.rstrip("=") | |
.replace("-", "_") | |
) | |
tag = spec.id.replace("-", "_") |
At this point, can we have .replace("-", "_")
in .id
?
Signed-off-by: Kevin Su <pingsutw@apache.org>
Tracking issue
NA
Why are the changes needed?
Problem 1
There are three hash functions in the image spec
__hash__
- used by the lru cache_calculate_deduped_hash_from_image_spec
- used as an identifier in the image config in the serialization contextcalculate_hash_from_image_spec
- used as an image tagwe should be able to unify 1 and 2
Probelm 2
Local and remote environments are generating different hash values from the same ImageSpec, which causes dynamic tasks to create tasks with the wrong image. Additionally, this discrepancy indirectly causes is_container always to return false, because the hash value is different from
_F_IMG_ID
For example:
In flytekit 1.3.14, we added entrypoint to the ImageSpec. If a user registers a workflow with Flytekit 1.3.14 but uses Flytekit 1.12.0 in the ImageSpec, the
_F_IMG_ID
will always be different from the hash of the ImageSpec. The reason is that we calculate the hash from asdict(image_spec)The solution is to calculate the hash only from the non-None values in the ImageSpec to ensure the hash is consistent across different flytekit versions.
What changes were proposed in this pull request?
Add a few methods to ImageSpec
id
-> It returns a unique hash as identifier of the ImageSpec. it will be used to identify the imageSpec in the ImageConfig or check if the current container image in the pod is built from this image spec inis_container()
.__hash__
-> it will return the hash of ImageSpec IDtag
-> the tag of the image.Remove
How was this patch tested?
Setup process
Screenshots
Check all the applicable boxes
Related PRs
NA
Docs link
NA