-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Improve execution name readability" #5740
Conversation
cc @eapolinario |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5740 +/- ##
==========================================
+ Coverage 34.44% 36.31% +1.87%
==========================================
Files 1139 1304 +165
Lines 102613 110016 +7403
==========================================
+ Hits 35341 39948 +4607
- Misses 63609 65911 +2302
- Partials 3663 4157 +494
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
cc @ddl-ebrown does it work for your use case? |
We can probably make this work, yes. You decided not to enhance the existing approach to reduce the collision risk? When we did the math for our variant, we calculated the 4 character suffix in combination with the number of terms used was enough to prevent collisions. |
We did the math for 4 words + the 6 character suffix, and found it still has a high possibility of having collisions when you have more than 1 million workflows. @eapolinario can share more details. |
Signed-off-by: Kevin Su <pingsutw@apache.org>
cc @eapolinario can we merge this |
In the original PR I made a comment about the need to use 4 words, which would give us about 4.3B unique ids. Compare that to the original method to produce execution ids uses the k8s rand package to generate 19-character strings using rand.String. Notice how these alphanumeric strings generated exclude vowels, so that would give us 21 consonants + 10 digits, so This problem of calculating the expected number of N-bit hashes before a collision is also known as the Birthday problem. And here's the crucial result:
In other words, in the proposed change we're starting from In the original code that number is ridiculously higher: We could devise a middle ground for this problem, but instead of spending time tuning that, we're going with a simpler solution that does not guarantee uniqueness for these friendly names, while we keep the strong guarantees provided by the original scheme. We're going to share the details in a later PR, but the tldr is that we're going to provide those friendly names only as part of the execution object (so they will act as a visual cue once rendered in the console), but can't be used to search for such executions (for that you'll still need to use the original execution id). |
for sure. I'll do it asap. |
Yup @eapolinario we did similar math. Our variant of friendly names we're using includes a 4 character nanoid appended to the name (i.e. 64^4 - https://github.com/matoous/go-nanoid/blob/main/gonanoid.go#L10) We end up with roughly 888 trillion unique ids. |
@eapolinario as a follow-on to this execution friendly naming, would you be opposed to doing the same thing for workflow versions? That problem is a little thornier b/c you probably want some sort of monotonically increasing sequence by default to order the workflows -- but I think it would also be useful since current workflow versions are cumbersome for end users. Thanks! |
Hello! I contributed to the previous PR and have been focusing on other work, but I’d love to know the current progress on this. Is @eapolinario already working on a new PR for this issue? Regarding the use of a 4-character nanoid, based on my calculations, it would yield 2^32 * 64^4 = 2^(32+24) = 2^56. This seems to be insufficient compared to N = 95 from the original hashes. Does this mean we will proceed with the middle-ground solution? |
Hi @eapolinario and @pingsutw, Before I start working on this task, I wanted to confirm a few details to ensure my approach aligns with your expectations:
Let me know your thoughts so I can proceed accordingly! |
Tracking issue
NA
Why are the changes needed?
Using a friendly name has a high possibility of collision.
Note: We plan to add another column to the execution table to store this friendly name. FlyteConsole can get the friendly name from GetExecutionRequest/ListExecutionRequest and display it in a separate column on the execution page.
What changes were proposed in this pull request?
Revert the commits
How was this patch tested?
uni test
Setup process
Screenshots
Check all the applicable boxes
Related PRs
NA
Docs link
NA