scrubber: perform manifest check with etag and refactors #10106
+72
−33
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds the etag to
remote_storage::ListingObject
, and uses it in the scrubber for performing the manifest freshness check added in #10007. This is in response to review giving feedback about the clock precision not being enough to prevent races.I think the addition of the etag is useful enough as a general change, even without the motivation of wanting to do cheap checks whether there has been a change or not.
Note that the etag check still isn't 100% perfect, and we could get there also by just comparing the manifest, but I think the addition of the etag is useful enough of its own right. I don't know, please write if you think that it's baggage.
As for why the etag check is not perfect: it is fully determined by the content at least on AWS (md5 on AWS, didn't check Azure). So if you change a file from A to B and back to A, the etag will be the original one. Maybe something based on the version ID would be better, but we can't assume to always have it, as sometimes versioning is disabled on the bucket.
Follow-up of #10007