Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sort in memory to Arrays library #4846

Merged
merged 30 commits into from
Feb 6, 2024

Conversation

Amxx
Copy link
Collaborator

@Amxx Amxx commented Jan 18, 2024

Replaces #3520 that is old. This comment remains relevant.

Fixes #3490
Fixes LIB-1189

IMO the approach should be to provide a simple/straight forward implementation that works. We can refine it/replace it down the line if we find a more efficient approach. Backward compatibility should not cause issues.

PR Checklist

  • Tests
  • Documentation
  • Changeset entry (run npx changeset add)

@Amxx Amxx added feature New contracts, functions, or helpers. Datastructures labels Jan 18, 2024
@Amxx Amxx added this to the 5.1 milestone Jan 18, 2024
Copy link

changeset-bot bot commented Jan 18, 2024

🦋 Changeset detected

Latest commit: 4336e2e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
openzeppelin-solidity Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@Amxx Amxx mentioned this pull request Jan 18, 2024
3 tasks
@Amxx Amxx requested a review from ernestognw January 22, 2024 12:31
@Amxx Amxx mentioned this pull request Jan 29, 2024
3 tasks
contracts/utils/Arrays.sol Outdated Show resolved Hide resolved
contracts/utils/Arrays.sol Outdated Show resolved Hide resolved
contracts/utils/Arrays.sol Outdated Show resolved Hide resolved
contracts/utils/Arrays.sol Outdated Show resolved Hide resolved
contracts/utils/Arrays.sol Outdated Show resolved Hide resolved
Amxx and others added 2 commits February 2, 2024 19:01
Co-authored-by: Ernesto García <ernestognw@gmail.com>
Co-authored-by: Ernesto García <ernestognw@gmail.com>
contracts/utils/Arrays.sol Outdated Show resolved Hide resolved
ernestognw
ernestognw previously approved these changes Feb 3, 2024
Copy link
Member

@ernestognw ernestognw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ernestognw ernestognw changed the title Arrays sorting (in memory) Add sort in memory to Arrays library Feb 3, 2024
@ernestognw
Copy link
Member

Before approving I want to clarify why we're picking the first element as the pivot.

I'm pending to dig more into this comment but I'd like to know if you got to a conclusion based on a previous discussion.

@Amxx
Copy link
Collaborator Author

Amxx commented Feb 5, 2024

Before approving I want to clarify why we're picking the first element as the pivot.

There are a few elements of answer, but the bottom line is: its simpler that way, and there is no reason to think its a bad choice.

Lets get into the issue:

A each step, the subroutine sort a subarray by choosing a pivot. Putting all the values that are smaller on one side, all the values that are bigger on the other, and run recursivelly on both "sides". The pivot is placed correctly for sure. The other values are "grouped" to be sorted. Mathematically, the best pivot is on that divides the array in two sub-arrays of equal size. Said otherwise, the best pivot is the median.
But if we wanted to get the median ... we basically need to sort the array, which is our goal.

We have tree options:

  1. Try go be smart, and chose the pivot based on some algorithm
  2. Chose the pivot randomnly
  3. Take a specific element (first one/last one/...)

The first option requires reading the values in a loop in order to make some decision. It means reading the array, which comes at a cost. I'm not conviced that choosing a "good" pivot would save move gas (due to the nice "split" it would produce) than the cost involved in choosing the good pivot

If the array is shuffled, there is no reason to think that point 2 or 3 would be better. You have the same probability of getting a good or bad pivot. So it comes down to practicality.

Since you don't want to have to pivot "in the way", a common approach is to set it asside, sort the rest of the array (into the 2 sections discussed earlier), and then put the pivot in the right place. The easiest way to set the pivot asside, is to put it at the beginning (or at the end), and then inverse it with the last element smaller than the pivot (or the first element bigger than the pivot). If you take your pivot randomly, putting it asside costs you one swap operation. If you choose the first (or the last) its already placed somewhere nice.

So the bottom line is:

  • using the first element is easy in terms of code
  • its statistically not a better or worst choice than any other randomly choosen pivot
  • you could do better by choosing a good pivot, but making this choice has a cost.

contracts/utils/Arrays.sol Outdated Show resolved Hide resolved
contracts/utils/Arrays.sol Outdated Show resolved Hide resolved
@Amxx Amxx requested a review from ernestognw February 6, 2024 15:23
@ernestognw
Copy link
Member

So the bottom line is:

  • using the first element is easy in terms of code
  • its statistically not a better or worst choice than any other randomly choosen pivot
  • you could do better by choosing a good pivot, but making this choice has a cost.

Great, I agree with this conclusions. I know there are other strategies to heuristically determine which element to use as the pivot.

I think it's worth keeping in mind that there may be use cases where specifying the pivot as part of the params of _quickSort. As you mentioned, it requires reading the values in the array, which is costly, but surely there may be a threshold at which it still makes sense (e.g. if guaranteed to have 20 elements, reading 20 may be better than applying the quicksort right away).

Right now it's fine since the function is private anyway.

@ernestognw ernestognw enabled auto-merge (squash) February 6, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datastructures feature New contracts, functions, or helpers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sorting function for memory arrays
3 participants