-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simultaneous Minimum and Maximum Evaluation #90
Conversation
Add a method extending sequences that finds both the minimal and maximal elements of the receiver in a single pass, based off a given ordering predicate. Add an overload that defaults the comparison predicate to the standard less-than operator.
I just splatted my documentation next to the existing docs in that file. There should be more work to merge them together. I'm not familiar with Python. But if their containers have an equivalent method for single-pass min and max, please suggest something to add to the end of the documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this addition, @CTMacUser! 📉📈
I’m still mulling over the method name. It looks like there’s some good input on the forums — thanks for starting that thread! I have a couple more notes below.
Sources/Algorithms/MinMax.swift
Outdated
if try areInIncreasingOrder(element, lowest) { | ||
lowest = element | ||
} else if try areInIncreasingOrder(highest, element) { | ||
highest = element | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a neat optimization here where you can take two elements from the sequence instead of just one, and then compare them against each other. The higher one can only ever be a candidate for highest
, while the lower can only be a candidate for lowest
. You end up saving a quarter of the comparisons — want to give that a go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this optimization. Is it documented anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's described in more detail in the original bug — please let me know if you need more info!
/// argument; otherwise, `false`. | ||
/// - Returns: A tuple with the sequence's minimum element, followed by its | ||
/// maximum element. For either member, if the sequence provides multiple | ||
/// qualifying elements, the one chosen is unspecified. The same element may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d like to take a more principled stand than this. Right now we have two different behaviors for which maximum element is chosen. First, sort()
, max(count:)
, and Swift.max
(not the sequence version) all either return the last equal value or, in the sort/min(count:) case, put it at the end.
/// All instances of `Foo` are equal to each other.
class Foo: Comparable {
static func ==(lhs: Foo, rhs: Foo) -> Bool { true }
static func < (lhs: Foo, rhs: Foo) -> Bool { false }
}
let a = Foo()
let b = Foo()
max(a, b) === b // true
[a, b].sorted().last === b // true
[a, b].max(count: 1).first === b // true
Second, the sequence version of max()
returns the first instance of the maximum value.
[a, b].max() === b // false
On the min
side, there’s only one behavior — all the equivalent methods return the first minimum element.
Even though we’ll end up where this method and the stdlib’s Sequence.max()
return different instances, let’s align this new method with the majoritarian position, and I’ll keep looking at how we can switch Sequence.max()
to come into alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enshrining which element is chosen to break ties seems like an over-specification; someone's code may be buggy if they're depending on a specific equivalent element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already specify this in e.g. Swift.max
, and I'd like to make the sort()
stability a guarantee, which is similar. The dependence is exactly why I'd like to specify the behavior — rather than someone depending on the result of an implementation quirk, they can depend on documented behavior, and we can test to make sure we meet the expectation.
Change the name of the functions finding the minimum and maximum values simultaneously to something more colloquial.
Document the fact that the functions finding the minimum and maximum values simultaneously assume the source sequence has a finite number of elements.
Change the core loop of the function that finds a sequence's minimum and maximum values simultaneously such that it scans two elements at a time. Adjust the corresponding tests to cover all the new decision branches.
I conditionally added @natecook1000 's suggestion to test two elements at a time within the loop. Can someone actually measure this with both short and (very?) long sequences? It looks like the sort of slick programming you'd think improves runtimes, but actually doesn't (or does so rarely it's not worth it). |
@CTMacUser I did some benchmarking and got I think what I'd expect — when the comparison is fast, there isn't a big difference between the two implementations, but when the comparison is non-trivial, the optimized version wins, as it's only performing 3/4 the number of comparisons. Here's a comparison of running When we compare strings with matching prefixes, on the other hand, the optimized version starts to win, since it's performing fewer expensive comparisons: (edited to show a more realistic |
@CTMacUser This looks ready to go! Want to remove the conditional bits of the implementation and we can merge? |
Guides/MinMax.md
Outdated
``` | ||
|
||
### Complexity | ||
|
||
The algorithm used is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if *k* is a small amount. If *k* is a large amount (more than 10% of the collection), we fall back to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`. | ||
The algorithm used for an extrema block is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if *k* is a small amount. If *k* is a large amount (more than 10% of the collection), we fall back to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is an "extrema block"? Can you use other terminology that would be more generally familiar to users?
Sources/Algorithms/MinMax.swift
Outdated
/// let hues = ["Heliotrope": 296, "Coral": 16, "Aquamarine": 156] | ||
/// let minmax = hues.minAndMax { a, b in a.value < b.value } | ||
/// let leastHue = minmax?.min, greatestHue = minmax?.max | ||
/// print(leastHue, greatestHue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we simplify this code example by force unwrapping? The user doesn't need to read Optional(...)
printed twice to understand the concept here, so best not to clutter. Something like (typed freehand--please check it for accuracy):
print(minmax!)
// Prints '(min: (key: "Coral", value: 16), max: (key: "Heliotrope", value: 296))'
} | ||
} | ||
} | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
Remove the conditional code choosing between the original and optimized core loop, retaining the optimized version. Update some documentation.
In the documentation blocks for the functions that find their receiving sequence's MIN and MAX in one pass, make the examples more readable.
} | ||
} | ||
return (lowest, highest) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The while loop can be substantially simplified, since iterators are guaranteed to keep returning nil
after they’ve done so once:
while var low = iterator.next() {
var high = iterator.next() ?? low
if try areInIncreasingOrder(high, low) { swap(&low, &high) }
if try areInIncreasingOrder(low, lowest) { lowest = low }
if try !areInIncreasingOrder(high, highest) { highest = high }
}
This does technically perform one extraneous comparison on odd-length sequences (high
vs low
after the ??
was triggered), but even that can be rectified by restoring the guard
if benchmarks show a measurable difference:
while var low = iterator.next() {
guard var high = iterator.next() else {
if try areInIncreasingOrder(low, lowest) { lowest = low }
if try !areInIncreasingOrder(low, highest) { highest = low }
break
}
if try areInIncreasingOrder(high, low) { swap(&low, &high) }
if try areInIncreasingOrder(low, lowest) { lowest = low }
if try !areInIncreasingOrder(high, highest) { highest = high }
}
Has a 4-at-a-time version of It doesn’t change the total number of comparisons versus the pairwise version, but per Steve Canon’s remarks on the forums it may still run faster by halving the loop-carried dependencies. And it’s pretty easy to write: while let a = iterator.next() {
let b = iterator.next() ?? a
let c = iterator.next() ?? b
let d = iterator.next() ?? c
let (low, high) = minAndMaxOf4(a, b, c, d)
if try areInIncreasingOrder(low, lowest) { lowest = low }
if try !areInIncreasingOrder(high, highest) { highest = high }
} Using the helper function: func minAndMaxOf4<T>(
_ a: T, _ b: T, _ c: T, _ d: T,
by areInIncreasingOrder: (T, T) throws -> Bool
) rethrows -> (min: T, max: T) {
var (a, b, c, d) = (a, b, c, d)
try sort(&a, &b, by: areInIncreasingOrder)
try sort(&c, &d, by: areInIncreasingOrder)
try sort(&a, &c, by: areInIncreasingOrder)
try sort(&b, &d, by: areInIncreasingOrder)
return (a, d)
}
func sort<T>(
_ a: inout T, _ b: inout T,
by areInIncreasingOrder: (T, T) throws -> Bool
) rethrows {
if try areInIncreasingOrder(b, a) { swap(&a, &b) }
} |
For the function that finds a sequence's minimum and maximum values simultaneously, tighten the code actually used for the loop. Improve the comments.
Change the core loop of the function that finds a sequence's minimum and maximum values simultaneously such that it scans four elements at a time.
In my quick runs of the already-provided tests, using the four-at-a-time loop results in more calls to the comparison predicate than the improved two-at-a-time loop did (78 vs. 68). |
(a) There should be at most 4 extra comparisons the way I wrote it, and they can be avoided by replacing the (b) The important thing to compare is the actual time the algorithm takes to run. (c) The potential time-save comes from reducing the loop-carried dependencies, not the number of comparisons. |
/// with the highest value. | ||
/// | ||
/// let hues = ["Heliotrope": 296, "Coral": 16, "Aquamarine": 156] | ||
/// if let extremeHues = hues.minAndMax(by: {$0.value < $1.value}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Spaces between the curly braces.
/// - Complexity: O(*n*), where *n* is the length of the sequence. | ||
@inlinable | ||
public func minAndMax() -> (min: Element, max: Element)? { | ||
return minAndMax(by: <) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We don't need a return here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the return
.
As a matter of style, I think return
should only be omitted when the entire{...}
block is on a single line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we had a style guide for these things. But you're right to be consistent with the whole codebase 😊 (I think). Looks like SE-0250 is not coming back soon, but I could raise it if it does.
@CTMacUser My benchmarking shows very small differences between find the min and max two-at-a-time vs four-at-a-time; which one wins depends on the cost of the comparison. In any case, we don't need to let that decision block merging this addition — could you choose one implementation and we'll get this merged? Thanks! |
Thanks for testing, Nate. If the performance is similar, then pairwise is clearly preferable for implementation simplicity. |
@swift-ci Please test |
@CTMacUser I can single out the implementation post-merge — thanks again for your contribution! 👏🏻👏🏻 |
Sorry for flaking out, but it seems everything got handled anyway. I've spent the time procuring an ARM-based Mac and transitioning my programming to it. |
No worries, and congrats! 🎉 |
Description
Adds a method to
Sequence
to evaluatemin(by:)
andmax(by:)
in a single pass. Adds an overload when theElement
type isComparable
to default the predicate to<
.Should fix Swift bug SR-890.
Detailed Design
Include any additional information about the design here. At minimum, show any new API:
Documentation Plan
There is documentation on the new methods. Plus the "Minmax" documentation file has been extended with descriptions of these methods.
Test Plan
The "Minmax" test file has new tests for the new methods.
Source Impact
The change should be additive.
Checklist