-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add content hash to Dora #17361
Add content hash to Dora #17361
Conversation
Automated checks report:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks. |
Automated checks report:
All checks passed! |
Currently when complete is called on a file in Alluxio, a fingerprint of the file will be created by performing a GetStauts on the file on the UFS. If due to a concurrent write, the state of the file is different than what was written through Alluxio, the fingerprint will not actually match the content of the file in Alluxio. If this happens the state of the file in Alluxio will always be out of sync with the UFS, and the file will never be updated to the most recent version. This is because metadata sync uses the fingerprint to see if the file needs synchronization, and if the fingerprint does not match the file in Alluxio there will be inconsistencies. This PR fixes this by having the contentHash field of the fingerprint be computed while the file is actually written on the UFS. For object stores, this means the hash is taken from the result of the call to PutObject. Unfortunately HDFS does not have a similar interface, so the content hash is taken just after the output stream is closed to complete the write. There could be a small chance that someone changes the file in this window between the two operations. pr-link: Alluxio#16597 change-id: cid-64723be309bdb14b05613864af3b6a1bb30cba6d
@huanghua78 will it be helpful for metadata & data cache management |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
alluxio-bot, merge this please |
merge failed: |
alluxio-bot, merge this please |
Merge missing commits from master-2.x to main. The commits in 2023/04/01~2023/04/30 from main...master-2.x will be included by this PR. **Exceptions** 4435584 was skipped because it has been manually cherry-picked to main by f64314c 9a4e154 and fd19fb6 are in a wrong order (ported in the end) [new checkpointing for embedded journal](8cbcbcd) is not in, need manual work [rockdb thread safety fix](9f152c5) is not in, need manual work bf60aef depends on the commit above [fsadmin report proxy command](dbac084) is not in, need manual work [fsadmin report jobservice show job master/worker versions](8da5953) is not in, need manual work 32f923e is not included because it has been manually cherry-picked by #17361 [graceful decommission worker](141ee0e) is not in, need manual work [fsadmin report capacity command show worker version](26257e6) is not in, need manual work pr-link: #17454 change-id: cid-1e0faae7f382429b4405a02d4a036d116757070e
What changes are proposed in this pull request?
cherry-pick content hash change #17207 & #16597 to Dora.
Why are the changes needed?
add content hash to dora
Does this PR introduce any user facing changes?
na