Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BB-425 Short version IDs break ordering - use lastModified #2427

Merged
merged 2 commits into from
Jul 5, 2023

Conversation

nicolas2bert
Copy link
Contributor

@nicolas2bert nicolas2bert commented Jun 27, 2023

Summary:

  • Short version IDs no longer preserve ordering.
  • We will use the "last-modified" date to sort version and delete markers.

Description:

  • Prior to the introduction of short version IDs, a lower version ID indicated a more recent version. However, with short version IDs, the ordering is no longer preserved.
  • To address this issue, we will use the "last-modified" date to sort version and delete markers from the object versions listing. This will ensure that the versions are always sorted in chronological order, regardless of the length of the version ID.
  • We considered decoding the version ID as a solution, but this would be quite expensive in terms of CPU time and memory usage.

Encoding algorithms history:

The "short version id" encoding uses lossy base62 encoding algorithms (Base62Integer, Base62String) that do not preserve the order of the timestamps in the version ID. Previously, encoding was done using the hex encoding algorithm, which is a lossless encoding algorithm because it does not lose any information during the encoding process.

@bert-e
Copy link
Contributor

bert-e commented Jun 27, 2023

Hello nicolas2bert,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Jun 27, 2023

Incorrect fix version

The Fix Version/s in issue BB-425 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 7.10.11

  • 7.70.2

  • 8.5.5

  • 8.6.25

  • 8.7.0

Please check the Fix Version/s of BB-425, or the target
branch of this pull request.

Summary:

- Short version IDs no longer preserve ordering.
- We will use the "last-modified" date to sort version and delete markers.

Description:

- Prior to the introduction of short version IDs, a lower version ID indicated a more recent version. However, with short version IDs, the ordering is no longer preserved.
- To address this issue, we will use the "last-modified" date to sort version and delete markers from the object versions listing. This will ensure that the versions are always sorted in chronological order, regardless of the length of the version ID.
- We considered decoding the version ID as a solution, but this would be quite expensive in terms of CPU time and memory usage.
@nicolas2bert nicolas2bert force-pushed the bugfix/BB-425/order branch from 11f3178 to e347762 Compare June 27, 2023 17:01
Copy link
Collaborator

@rahulreddy rahulreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolas2bert Why not decode the version id as they are guaranteed to be ordered?

@nicolas2bert
Copy link
Contributor Author

The "base62Decode" function (Arsenal/lib/versioning/VersionID.ts at development/7.10 · scality/Arsenal ) could be quite expensive in terms of CPU time and memory usage. There are a few reasons for this:

  • The function makes three calls to the base62Integer.decode() and base62String.decode() functions. These functions are themselves relatively expensive, so making three calls to them adds up.

  • The function has to iterate through the string to find the start of each encoded value. This is also an expensive operation.

  • The function has to slice the string three times. This is another relatively expensive operation.

All of these factors add up to make this function relatively expensive.

@nicolas2bert
Copy link
Contributor Author

nicolas2bert commented Jun 27, 2023

@rahulreddy Is there any advantage to using version ID over creation date?

The only disadvantage, I see, of using the creation date is that two versions could be created at the same time.
In that case, we could decode the version ID to decide.

@nicolas2bert nicolas2bert force-pushed the bugfix/BB-425/order branch 2 times, most recently from 76a47e1 to f6a03aa Compare June 28, 2023 17:21
@nicolas2bert nicolas2bert force-pushed the bugfix/BB-425/order branch from f6a03aa to e396f9f Compare June 28, 2023 17:37
@rahulreddy
Copy link
Collaborator

@rahulreddy Is there any advantage to using version ID over creation date?

The only disadvantage, I see, of using the creation date is that two versions could be created at the same time. In that case, we could decode the version ID to decide.

Yup, my main concern is the collision. You will have two versions created at the same ms, we already see at at a lot of deployments today. I would add another layer of entropy to be unique, may be you decode version ids and compare only for this case.

@nicolas2bert
Copy link
Contributor Author

Yes, I agree. This condition is handled in the PR.

@francoisferrand
Copy link
Contributor

AFAICT, we also rely on ordering of keys in MongoClientInterface (and possibly metadata ?) to compute the last "version" to use as master... Does it mean we will need to do the same kind of fix in Arsenal : https://github.com/scality/Arsenal/blob/bdb59a0e63c20578078bd169e70c78a0c49f6e60/lib/storage/metadata/mongoclient/MongoClientInterface.js#L1055 ?

@alexanderchan-scality
Copy link
Contributor

alexanderchan-scality commented Jun 28, 2023

do correct me if I am wrong, short version ID is only used at the encoding/decoding step in cloudserver, and the generateVersionID (https://github.com/scality/Arsenal/blob/bdb59a0e63c20578078bd169e70c78a0c49f6e60/lib/versioning/VersionID.ts#L100) is still timestamp based, so list order of keys should still be in some chronological order; the description may be a bit misleading attributing the issue to shortVersionID.

this issue would then look like an edge-case (race condition?) scenario that will also affect the long version ID and would likely cause issue to the getLatestVersion logic pointed out by Francois

@nicolas2bert
Copy link
Contributor Author

nicolas2bert commented Jun 29, 2023

Internal medatada methods (server side) use decoded version id. For decoded version ids, the order is preserved.

However, the lifecycle "merging and sorting" (client side) method used base62 encoded version id to sort the listing (with merging versions and delete markers) that does not seem to preserve the order (based on creation date). Cf PR description.

I am not sure to understand your concerns @francoisferrand @alexanderchan-scality .

@alexanderchan-scality
Copy link
Contributor

alexanderchan-scality commented Jun 29, 2023

@nicolas2bert i see. I had confused the versionID comparison to be with the internal version id; then I do agree with you that the base62 encode does lose the lexigraphic order property needed for the comparison.

In that case, the find latest version would not be affected since that uses the internal VID for comparison.

@francoisferrand
Copy link
Contributor

francoisferrand commented Jun 29, 2023

Internal medatada methods (server side) use decoded version id. For decoded version ids, the order is preserved.

thanks @nicolas2bert , that is the part I missed, and got me worried of similar issue with arsenal.

s3 listObjectVersions is supposed to "request returns objects in the order that they were stored, returning the most recently stored object first". I do not see any mention (in the doc) that this order covers both versions & delete markers, but it seems to be consistent with the above statement and the exemples there...

So I wonder why we do this sorting in lifecycle, and if we should not implement this in cloudserver/arsenal instead? (This may avoid issues in the UI as well)

@nicolas2bert
Copy link
Contributor Author

@francoisferrand The listObjectVersions API returns versions and delete markers in separate arrays. To determine the object's "stale date," the arrays must be merged and sorted.
With the introduction of lifecycle optimization listings, this logic will be deprecated.

Comment on lines +524 to +525
const isVersionLastModifiedNewer = versionLastModified > deleteMarkerLastModified;
const isDMLastModifiedNewer = deleteMarkerLastModified > versionLastModified;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this makes the code harder to read IMO, as it "hides" the real intent here... i.e. just comparing the date, and handling the 3 possible outcomes of the comparison....

if (isVersionVidNewer) {
// If the version and the delete marker have the same last modified date
const nullVersion = (versions[vIdx].VersionId === 'null'
|| deleteMarkers[dmIdx].VersionId === 'null');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing indent

}

const isVersionVidNewer = decodedVersionId < decodedDMId;
if (isVersionVidNewer) {
Copy link
Contributor

@francoisferrand francoisferrand Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these push to sortedList and incrementation are repeated in multiple places, while they are actually the critical step of the algorithm.

to make core safer & easier to test, would be better to introduce a bool compare(version, deleteMarker) function, which would only perform the comparison (and be easy to test all corner cases), and keep the code simple & readable in the algorithm (handling only the progression and update of the list, ie. all sortedList.push(XXXXX[YYYYY++]))

@nicolas2bert
Copy link
Contributor Author

@bert-e approve

@bert-e
Copy link
Contributor

bert-e commented Jul 5, 2023

Incorrect fix version

The Fix Version/s in issue BB-425 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 7.10.11

  • 7.70.2

  • 8.5.5

  • 8.6.26

  • 8.7.0

Please check the Fix Version/s of BB-425, or the target
branch of this pull request.

The following options are set: approve

@nicolas2bert
Copy link
Contributor Author

ping

@bert-e
Copy link
Contributor

bert-e commented Jul 5, 2023

Conflict

A conflict has been raised during the creation of
integration branch w/8.5/bugfix/BB-425/order with contents from w/7.70/bugfix/BB-425/order
and development/8.5.

I have not created the integration branch.

Here are the steps to resolve this conflict:

 $ git fetch
 $ git checkout -B w/8.5/bugfix/BB-425/order origin/development/8.5
 $ git merge origin/w/7.70/bugfix/BB-425/order
 $ # <intense conflict resolution>
 $ git commit
 $ git push -u origin w/8.5/bugfix/BB-425/order

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jul 5, 2023

Integration data created

I have created the integration data for the additional destination branches.

The following branches will NOT be impacted:

  • development/7.4

You can set option create_pull_requests if you need me to create
integration pull requests in addition to integration branches, with:

@bert-e create_pull_requests

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jul 5, 2023

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/7.10

  • ✔️ development/7.70

  • ✔️ development/8.5

  • ✔️ development/8.6

  • ✔️ development/8.7

The following branches will NOT be impacted:

  • development/7.4

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jul 5, 2023

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/7.10

  • ✔️ development/7.70

  • ✔️ development/8.5

  • ✔️ development/8.6

  • ✔️ development/8.7

The following branches have NOT changed:

  • development/7.4

Please check the status of the associated issue BB-425.

Goodbye nicolas2bert.

@bert-e bert-e merged commit e396f9f into development/7.10 Jul 5, 2023
@bert-e bert-e deleted the bugfix/BB-425/order branch July 5, 2023 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants