-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lower_bound
, upper_bound
and equal_range
for slices where T: Ord to complement binary_search
#2184
Comments
Is is more convenient to have an |
Returning a I don't understand the question about |
Since they are looking for "the first element in the sorted slice that is ..." I would return the length of the slice. This would make it easy to use these function to find the position that would be the insertion point of an element so that the order is preserved (and the element is added as the first/last of a streak of elements that compare as equal). |
We return the length of the slice or the position of the lower bound? What happens if the element does not exist (past the end of the slice)? Do you think returning 0 for lower bound and self.len() for upper bound makes sense in that case? |
I think For example, A similar reasoning also applies to
Note that while the current definition of |
Thanks Andrea. I updated the code above with examples. Let me know if you agree.
|
Yep, the examples match my expectations :) Some nits (I think most of these are basically just typos):
Some proposals for an alternative wording of the description (very slice-oriented):
This wording is more verbose, but it shows why the "element not found" is not actually a special case for these functions and highlights in a very direct way the relationship between |
I like the new wording. Incorporated. |
I think "any empty range" is unfortunate, because there's very useful information that throws away. The other use of /// Returns the range of positions at which `x` could be inserted such
/// that the sequence would remain sorted.
fn where_to_insert_in_sorted(&self, x: &T) -> RangeInclusive<usize>; That return value would have the same Phrased differently, I think it's important that |
Isn't there any meaningful data to be returned from a "not found" lower_bound? Can't it behave similar to binary_search? Same for upper_bound. |
@scottmcm equal_range does not return "any empty range". What is returned is defined as your first suggestion. @arthurprs can you give an example? |
@alkis nevermind, that was a confusion moment. I think lower_bound should return 0 and upper_bound should return slice.len() if no range can be found. Same as you proposed on #2184 (comment) |
@arthurprs this is what it will return given the description. The examples also show these cases. |
I see, it's just that |
Fixed. |
What's the next step? Does this need to go through a PR submission to the RFC repository? |
Is there a reason these functions don't use |
It is a tradeoff. If we do the check then all users need to pay the price
of the extra branch. If we don't then the users that don't need it don't
pay for it. Since rust is going for zero cost/systems programming, I find
that this API is a better fit.
|
I published the implementation in a crate here: https://crates.io/crates/ordslice. When/if this lands the crate can be deprecated. |
Could we add the option for the bound to be inclusive or exclusive using fn lower_bound(&self, x: Bound<&T>) -> usize
where
T: Ord; This is similar to the API I used for |
Do you have an idea of the performance cost of such a change? Also do you have examples of using such API in practice that either allow use-cases that are not possible otherwise and/or more readable code? |
Can't exclusive bounds be emulated by altering the comparison fn? |
the need for something like The main thing that was awkward with that solution, and which still feels awkward in the But for I think the The only problem is that this doesn't work well with |
You can find |
I was hoping to move forward the discussion so that maybe these functions can be added to the standard library. But happy to do that elsewhere if that's preferred? |
For let vec = vec![3, 5, 6, 6, 8, 10, 12];
assert_eq!(vec.search_something_range(5..9), (1, 5));
assert_eq!(vec.search_something_range(5..10), (1, 5));
assert_eq!(vec.search_something_range(5..=10), (1, 6));
assert_eq!(vec.search_something_range((Exclusive(6), Inclusive(20)), (4, 7)); It'd also be cool if you could do let vec = vec![3, 5, 6, 6, 8, 10, 12];
assert_eq!(vec.search_something_range(6), (2, 4)); However that'd mean we couldn't use the |
FWIW, if there's interest in having this use cases discussed in this RFC directly addressed in stdlib, independent of what API is used to do it, then I suggest speaking up. The PR in rust-lang/rust#52530 got rejected because it wasn't clear that the usecase needed addressing. |
I would be really happy to have that in stdlib |
I am hitting this issue in sled, and finding myself writing a few wrappers around standard library functionality that do not feel obviously correct (similar to rust-lang/rust#52529). It's important for me to quickly locate these bounds, both exclusively and inclusively, to enable various tree search and iteration features. |
Why this would be useful? What does I rather strongly feel that the proposed signature of fn (lower|upper)_bound(&self, x: &T) -> usize
where
T: Ord; is "canonical". For a given I am not so sure about
pub fn binary_search_range<K, R>(&self, range: R) -> std::ops::Range<usize> where
K: Ord + ?Sized,
R: RangeBounds<K>,
T: Borrow<K>, |
It would always return 0 (the first element).
If the slice does not contain |
So,
If this is the case, looks like |
I haven't been following this thread, but having the "lower bound" of 92 ever be above all the 92s seems like we've either lost track of what "lower" means, or these are really bad names for whatever operations are being proposed. Assuming these are the operations the names imply they are, and we do want
|
You need to think of the bounds in terms of a range of values that you are interested in. So The definition for |
Interesting. I think that means the names are off. But "upper bound"/"upper bound (inclusive)"/"lower bound (exclusive)" apparently means "the first non-x value after an x value" which is a super weird concept I don't know of a concise name for. Hm... |
For a couple concrete "point" query examples I'm working with, consider a B+ tree where pointers to children are stored in a sorted slice as In general, I find all of the below use cases useful for working with versioned trees:
All of the above can be expressed with iterators, but that prevents their compatibility with |
I believe that this is expressible with bound-less methods like this:
The first two seem trickier than ideal, but I don't think they can be better expressed with The version of @Ixrec (#2184 (comment)) does make this easier though. On the meta level, if we can come up with two semantics for |
It's not just a simple
|
@Amanieu sorry, I wasn't clear, the -1 referred to the difference between #2184 (comment) and #2184 (comment). That is, it's the difference between semantics one might thing |
It would be nice if this could move forward somehow. I was surprised that there was a binary_search in std, but that it apparently selects an element at random if there are multiple elements with the same value. |
C++ has partition_point. It has clear definition like
I can use this when I cannot remember the definition of lower/upper_bound :) |
@VillSnow this sounds like a great idea, as this is indeed more general than binary search, and has trivial design space. Would you like to send a PR to libcore to add this? I am not on the libs team, but seems like this API can be accepted without too much deliberation: impl<T> [T] {
fn partition_point<T, P>(&self, mut pred: P) -> usize
where
P: FnMut(&T) -> bool,
} As basic check, rust also use |
@matklad my understanding is; first make a issue "Traking issue for partition_point", set issue number of unstable attribute and send PR? |
Add partition_point Add partition_point in C++. Although existing binary_search in rust does not suitable when the slice has multiple hits, this function returns exact point of partition. The definition of this function is very clear and able to accept general matter, therefore you can easily get index which you want like lower/upper_bound. rust-lang/rfcs#2184
Maybe this issue can be closed, now that this has merged? |
I think
|
Motivation
Fom discussion on rust-lang/rust#45333.
There is demand for
lower_bound
,upper_bound
andequal_range
especially for sorted sequences of non-unique elements. In such sequences the aforementioned operations are more interesting and useful thanbinary_search
. The addition of these three ops will complement the support of sorted slices in the standard library.Proposed APIs
lower_bound
upper_bound
equal_range
The text was updated successfully, but these errors were encountered: