Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Improve execution name readability" #5740

Merged
merged 5 commits into from
Sep 26, 2024
Merged

Conversation

pingsutw
Copy link
Member

Tracking issue

NA

Why are the changes needed?

Using a friendly name has a high possibility of collision.

Note: We plan to add another column to the execution table to store this friendly name. FlyteConsole can get the friendly name from GetExecutionRequest/ListExecutionRequest and display it in a separate column on the execution page.

What changes were proposed in this pull request?

Revert the commits

How was this patch tested?

uni test

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

NA

Docs link

NA

@pingsutw pingsutw changed the title Revert humanid Revert "Improve execution name readability" Sep 11, 2024
@pingsutw
Copy link
Member Author

cc @eapolinario

Copy link

codecov bot commented Sep 11, 2024

Codecov Report

Attention: Patch coverage is 76.92308% with 3 lines in your changes missing coverage. Please review.

Project coverage is 36.31%. Comparing base (c01a059) to head (193c662).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
flyteadmin/scheduler/executor/executor_impl.go 50.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5740      +/-   ##
==========================================
+ Coverage   34.44%   36.31%   +1.87%     
==========================================
  Files        1139     1304     +165     
  Lines      102613   110016    +7403     
==========================================
+ Hits        35341    39948    +4607     
- Misses      63609    65911    +2302     
- Partials     3663     4157     +494     
Flag Coverage Δ
unittests-datacatalog 51.37% <ø> (ø)
unittests-flyteadmin 55.62% <76.92%> (+<0.01%) ⬆️
unittests-flytecopilot 12.17% <ø> (ø)
unittests-flytectl 62.21% <ø> (?)
unittests-flyteidl 7.12% <ø> (ø)
unittests-flyteplugins 53.35% <ø> (ø)
unittests-flytepropeller 41.89% <ø> (ø)
unittests-flytestdlib 55.37% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw
Copy link
Member Author

pingsutw commented Sep 11, 2024

Note: We plan to add another column to the execution table to store this friendly name. FlyteConsole can get the friendly name from GetExecutionRequest/ListExecutionRequest and display it in a separate column on the execution page.

cc @ddl-ebrown does it work for your use case?

@pingsutw pingsutw mentioned this pull request Sep 11, 2024
3 tasks
@ddl-ebrown
Copy link
Contributor

Note: We plan to add another column to the execution table to store this friendly name. FlyteConsole can get the friendly name from GetExecutionRequest/ListExecutionRequest and display it in a separate column on the execution page.

cc @ddl-ebrown does it work for your use case?

We can probably make this work, yes.

You decided not to enhance the existing approach to reduce the collision risk? When we did the math for our variant, we calculated the 4 character suffix in combination with the number of terms used was enough to prevent collisions.

@pingsutw
Copy link
Member Author

We did the math for 4 words + the 6 character suffix, and found it still has a high possibility of having collisions when you have more than 1 million workflows. @eapolinario can share more details.

Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw
Copy link
Member Author

cc @eapolinario can we merge this

@eapolinario
Copy link
Contributor

We did the math for 4 words + the 6 character suffix, and found it still has a high possibility of having collisions when you have more than 1 million workflows. @eapolinario can share more details.

In the original PR I made a comment about the need to use 4 words, which would give us about 4.3B unique ids. Compare that to the original method to produce execution ids uses the k8s rand package to generate 19-character strings using rand.String. Notice how these alphanumeric strings generated exclude vowels, so that would give us 21 consonants + 10 digits, so 31^19 ~= 2.16 * 10^28 (as per https://www.wolframalpha.com/input?i=31%5E19).

This problem of calculating the expected number of N-bit hashes before a collision is also known as the Birthday problem. And here's the crucial result:

the expected number of N-bit hashes that can be generated before getting a collision is not 2^N, but rather only 2^(N⁄2).

In other words, in the proposed change we're starting from 256^4 = 2^32 unique ids, so N = 32, so in an ideal world, we're expected to generate a collision after 2^16 ids, or ~65K executions.

In the original code that number is ridiculously higher: 31^19 ~= 1.09 * 2^94, so N = 95, which gives the expected number of hashes to be about 2^47 which is about ~140 trillion ids before we see a collision.

We could devise a middle ground for this problem, but instead of spending time tuning that, we're going with a simpler solution that does not guarantee uniqueness for these friendly names, while we keep the strong guarantees provided by the original scheme. We're going to share the details in a later PR, but the tldr is that we're going to provide those friendly names only as part of the execution object (so they will act as a visual cue once rendered in the console), but can't be used to search for such executions (for that you'll still need to use the original execution id).

@eapolinario
Copy link
Contributor

cc @eapolinario can we merge this

for sure. I'll do it asap.

@eapolinario eapolinario merged commit c0cb4d1 into master Sep 26, 2024
55 checks passed
@eapolinario eapolinario deleted the revert-humanid branch September 26, 2024 21:36
@ddl-ebrown
Copy link
Contributor

Yup @eapolinario we did similar math. Our variant of friendly names we're using includes a 4 character nanoid appended to the name (i.e. 64^4 - https://github.com/matoous/go-nanoid/blob/main/gonanoid.go#L10)

We end up with roughly 888 trillion unique ids.

@ddl-ebrown
Copy link
Contributor

@eapolinario as a follow-on to this execution friendly naming, would you be opposed to doing the same thing for workflow versions? That problem is a little thornier b/c you probably want some sort of monotonically increasing sequence by default to order the workflows -- but I think it would also be useful since current workflow versions are cumbersome for end users.

Thanks!

@wayner0628
Copy link
Contributor

Hello! I contributed to the previous PR and have been focusing on other work, but I’d love to know the current progress on this. Is @eapolinario already working on a new PR for this issue?

Regarding the use of a 4-character nanoid, based on my calculations, it would yield 2^32 * 64^4 = 2^(32+24) = 2^56. This seems to be insufficient compared to N = 95 from the original hashes. Does this mean we will proceed with the middle-ground solution?

@wayner0628
Copy link
Contributor

Hi @eapolinario and @pingsutw,

Before I start working on this task, I wanted to confirm a few details to ensure my approach aligns with your expectations:

  • For the single execution page, would you like the friendly name to be included as part of the metadata, or would you prefer it to be displayed next to the hashed execution name?
    Screenshot 2024-10-22 at 5 23 09 PM

  • For the executions overview page, should the friendly name be displayed as shown in the image? I’m concerned it might compete for space with other information on the page.
    Screenshot 2024-10-22 at 5 26 47 PM

Let me know your thoughts so I can proceed accordingly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants