Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add content hash to Dora #17361

Merged
merged 4 commits into from
May 3, 2023
Merged

Add content hash to Dora #17361

merged 4 commits into from
May 3, 2023

Conversation

jja725
Copy link
Contributor

@jja725 jja725 commented May 1, 2023

What changes are proposed in this pull request?

cherry-pick content hash change #17207 & #16597 to Dora.

Why are the changes needed?

add content hash to dora

Does this PR introduce any user facing changes?

na

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • Commits associated with Github account: PASS
  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • First word of title ("Dora") is not an imperative verb. Please use one of the valid words

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

@jja725 jja725 changed the title Dora checksum Add content hash to Dora May 1, 2023
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • Commits associated with Github account: PASS
  • PR title follows the conventions: PASS

All checks passed!

@alluxio-bot alluxio-bot added the API Change Changes covering public API label May 1, 2023
@jja725 jja725 requested review from LuQQiu and beinan May 1, 2023 22:53
jja725 and others added 4 commits May 2, 2023 13:18
Currently when complete is called on a file in Alluxio, a fingerprint of
the file will be created by performing a GetStauts on the file on the
UFS. If due to a concurrent write, the state of the file is different
than what was written through Alluxio, the fingerprint will not actually
match the content of the file in Alluxio. If this happens the state of
the file in Alluxio will always be out of sync with the UFS, and the
file will never be updated to the most recent version.
This is because metadata sync uses the fingerprint to see if the file
needs synchronization, and if the fingerprint does not match the file in
Alluxio there will be inconsistencies.

This PR fixes this by having the contentHash field of the fingerprint be
computed while the file is actually written on the UFS. For object
stores, this means the hash is taken from the result of the call to
PutObject. Unfortunately HDFS does not have a similar interface, so the
content hash is taken just after the output stream is closed to complete
the write. There could be a small chance that someone changes the file
in this window between the two operations.

pr-link: Alluxio#16597
change-id: cid-64723be309bdb14b05613864af3b6a1bb30cba6d
@alluxio-bot alluxio-bot removed the API Change Changes covering public API label May 2, 2023
@LuQQiu
Copy link
Contributor

LuQQiu commented May 3, 2023

@huanghua78 will it be helpful for metadata & data cache management

Copy link
Contributor

@LuQQiu LuQQiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LuQQiu
Copy link
Contributor

LuQQiu commented May 3, 2023

alluxio-bot, merge this please

@alluxio-bot
Copy link
Contributor

merge failed:
Merge refused because pull request does not have label start with type-

@LuQQiu LuQQiu added the type-feature This issue is a feature request label May 3, 2023
@LuQQiu
Copy link
Contributor

LuQQiu commented May 3, 2023

alluxio-bot, merge this please

@alluxio-bot alluxio-bot merged commit 99c926a into Alluxio:dora May 3, 2023
alluxio-bot added a commit that referenced this pull request May 22, 2023
Merge missing commits from master-2.x to main. The commits in 2023/04/01~2023/04/30 from main...master-2.x will be included by this PR.

**Exceptions**
4435584 was skipped because it has been manually cherry-picked to main by f64314c

9a4e154 and fd19fb6 are in a wrong order (ported in the end)

[new checkpointing for embedded journal](8cbcbcd) is not in, need manual work

[rockdb thread safety fix](9f152c5) is not in, need manual work

bf60aef depends on the commit above

[fsadmin report proxy command](dbac084) is not in, need manual work

[fsadmin report jobservice show job master/worker versions](8da5953) is not in, need manual work

32f923e is not included because it has been manually cherry-picked by #17361

[graceful decommission worker](141ee0e) is not in, need manual work

[fsadmin report capacity command show worker version](26257e6) is not in, need manual work


			pr-link: #17454
			change-id: cid-1e0faae7f382429b4405a02d4a036d116757070e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature This issue is a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants