-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BB-425 Short version IDs break ordering - use lastModified #2427
Conversation
Hello nicolas2bert,My role is to assist you with the merge of this Status report is not available. |
Summary: - Short version IDs no longer preserve ordering. - We will use the "last-modified" date to sort version and delete markers. Description: - Prior to the introduction of short version IDs, a lower version ID indicated a more recent version. However, with short version IDs, the ordering is no longer preserved. - To address this issue, we will use the "last-modified" date to sort version and delete markers from the object versions listing. This will ensure that the versions are always sorted in chronological order, regardless of the length of the version ID. - We considered decoding the version ID as a solution, but this would be quite expensive in terms of CPU time and memory usage.
11f3178
to
e347762
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicolas2bert Why not decode the version id as they are guaranteed to be ordered?
The "base62Decode" function (Arsenal/lib/versioning/VersionID.ts at development/7.10 · scality/Arsenal ) could be quite expensive in terms of CPU time and memory usage. There are a few reasons for this:
All of these factors add up to make this function relatively expensive. |
@rahulreddy Is there any advantage to using version ID over creation date? The only disadvantage, I see, of using the creation date is that two versions could be created at the same time. |
76a47e1
to
f6a03aa
Compare
f6a03aa
to
e396f9f
Compare
Yup, my main concern is the collision. You will have two versions created at the same ms, we already see at at a lot of deployments today. I would add another layer of entropy to be unique, may be you decode version ids and compare only for this case. |
Yes, I agree. This condition is handled in the PR. |
AFAICT, we also rely on ordering of keys in MongoClientInterface (and possibly metadata ?) to compute the last "version" to use as master... Does it mean we will need to do the same kind of fix in Arsenal : https://github.com/scality/Arsenal/blob/bdb59a0e63c20578078bd169e70c78a0c49f6e60/lib/storage/metadata/mongoclient/MongoClientInterface.js#L1055 ? |
do correct me if I am wrong, short version ID is only used at the encoding/decoding step in cloudserver, and the this issue would then look like an edge-case (race condition?) scenario that will also affect the long version ID and would likely cause issue to the |
Internal medatada methods (server side) use decoded version id. For decoded version ids, the order is preserved. However, the lifecycle "merging and sorting" (client side) method used base62 encoded version id to sort the listing (with merging versions and delete markers) that does not seem to preserve the order (based on creation date). Cf PR description. I am not sure to understand your concerns @francoisferrand @alexanderchan-scality . |
@nicolas2bert i see. I had confused the versionID comparison to be with the internal version id; then I do agree with you that the base62 encode does lose the lexigraphic order property needed for the comparison. In that case, the find latest version would not be affected since that uses the internal VID for comparison. |
thanks @nicolas2bert , that is the part I missed, and got me worried of similar issue with arsenal. s3 listObjectVersions is supposed to "request returns objects in the order that they were stored, returning the most recently stored object first". I do not see any mention (in the doc) that this order covers both versions & delete markers, but it seems to be consistent with the above statement and the exemples there... So I wonder why we do this sorting in lifecycle, and if we should not implement this in cloudserver/arsenal instead? (This may avoid issues in the UI as well) |
@francoisferrand The listObjectVersions API returns versions and delete markers in separate arrays. To determine the object's "stale date," the arrays must be merged and sorted. |
const isVersionLastModifiedNewer = versionLastModified > deleteMarkerLastModified; | ||
const isDMLastModifiedNewer = deleteMarkerLastModified > versionLastModified; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this makes the code harder to read IMO, as it "hides" the real intent here... i.e. just comparing the date, and handling the 3 possible outcomes of the comparison....
if (isVersionVidNewer) { | ||
// If the version and the delete marker have the same last modified date | ||
const nullVersion = (versions[vIdx].VersionId === 'null' | ||
|| deleteMarkers[dmIdx].VersionId === 'null'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing indent
} | ||
|
||
const isVersionVidNewer = decodedVersionId < decodedDMId; | ||
if (isVersionVidNewer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these push to sortedList
and incrementation are repeated in multiple places, while they are actually the critical step of the algorithm.
to make core safer & easier to test, would be better to introduce a bool compare(version, deleteMarker)
function, which would only perform the comparison (and be easy to test all corner cases), and keep the code simple & readable in the algorithm (handling only the progression and update of the list, ie. all sortedList.push(XXXXX[YYYYY++])
)
@bert-e approve |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the The following options are set: approve |
ping |
ConflictA conflict has been raised during the creation of I have not created the integration branch. Here are the steps to resolve this conflict: $ git fetch
$ git checkout -B w/8.5/bugfix/BB-425/order origin/development/8.5
$ git merge origin/w/7.70/bugfix/BB-425/order
$ # <intense conflict resolution>
$ git commit
$ git push -u origin w/8.5/bugfix/BB-425/order The following options are set: approve |
Integration data createdI have created the integration data for the additional destination branches.
The following branches will NOT be impacted:
You can set option
The following options are set: approve |
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue BB-425. Goodbye nicolas2bert. |
Summary:
Description:
Encoding algorithms history:
The "short version id" encoding uses lossy base62 encoding algorithms (Base62Integer, Base62String) that do not preserve the order of the timestamps in the version ID. Previously, encoding was done using the hex encoding algorithm, which is a lossless encoding algorithm because it does not lose any information during the encoding process.