-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify staging zone readiness to size only #535
Simplify staging zone readiness to size only #535
Conversation
Thank you @neelvirdy for doing this, I have always wanted some form of deterministic behavior for the staging zone, and this PR does just that - awesome! I have always been of the opinion that the min staging bucket size should match the individual deal threshold size - it keeps things consistent (guarantees the same min deal size across aggregated and non-aggregated content deals), intuitive, and deterministic. I agree with removing the time-based (closing and keep-alive) rules, as it can easily yield content sizes that would not satisfy the min ask piece size and/or not give SPs consistent min deal size (which I believe Estuary from the onset, has always wanted to give SPs some guarantee that there get almost 3.6Gb per deal). Some suggestions
PS: I am available to work with you on this, let me know if you need anything. |
d9f46f3
to
ce56f6e
Compare
Thanks for the thorough review @en0ma! I incorporated most of your suggestions and have successfully uploaded 20k small files (190KB each) on my local estuary and reached I also realized that the staging buckets are modeled and auto-recovered in the contents table here. I believe we can avoid having a
Lmk your thoughts! Ideally I'd like to make the bucket persistence changes in a follow up PR so we can improve SP experience ASAP. Will look into some cleanup tweaks tomorrow as well, particularly around constant names and redundancy. |
Great to see that it worked, awesome job!
Let's discuss this in the follow-up PR, but in summary, this PR has addressed SPs pain, the follow-up PR is to address RAM usage (by not keeping its metadata in memory, it can quickly build up - we have no control) and improve deployment by not rebuilding staging-zones which blocks deal processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - great job!
I revisited the tuning of minStagingZoneSizeLimit and maxStagingZoneSizeLimit. It seemed like a fairly tight range that would be hard to hit (13.4GB-14.4GB). IMO the range should correlate to the typical piece size range an SP would accept. From some brief slack searching, it seems like 32GiB should be our absolute max, and some common min piece sizes are 16GB, 1GB, and 256 bytes. I settled on 3.6GB since it will yield consistent deal sizes by matching the individual deal threshold (also 3.6GB), and it strikes a balance between usable in FE but not irrelevantly small for SPs. This way, SPs should see deals from estuary consistently in the range of 3.6GB-31.6GB, likely skewed towards 3.6GB.
Associated FE PR since some metadata that is rendered in FE is removed by this PR: application-research/estuary-www#97