Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simultaneous Minimum and Maximum Evaluation #90

Merged
merged 10 commits into from
May 12, 2021
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ package updates, you can specify your package dependency using

## [Unreleased]

-`adjacentPairs()` lazily iterates over tuples of adjacent elements of a sequence.
- `adjacentPairs()` lazily iterates over tuples of adjacent elements of a sequence.
- `minAndMax()` finds both the smallest and largest elements of a sequence in a single pass.

---

Expand Down
33 changes: 30 additions & 3 deletions Guides/MinMax.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Min/Max with Count
# Minima and/or Maxima

[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/MinMax.swift) |
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/MinMaxTests.swift)]
Expand All @@ -13,6 +13,17 @@ let smallestThree = numbers.min(count: 3, sortedBy: <)
// [1, 2, 3]
```

Return the smallest and largest elements of this sequence, determined by a predicate or in the order defined by `Comparable` conformance.

If you need both the minimum and maximum values of a collection, using these methods can give you a performance boost over running the `min` method followed by the `max` method. Plus they work with single-pass sequences.

```swift
let numbers = [7, 1, 6, 2, 8, 3, 9]
if let (smallest, largest) = numbers.minAndMax(by: <) {
// Work with 1 and 9....
}
```

## Detailed Design

This adds the `Collection` methods shown below:
Expand All @@ -31,6 +42,16 @@ extension Collection {
}
```

And the `Sequence` method:

```swift
extension Sequence {
public func minAndMax(
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> (min: Element, max: Element)?
}
```

Additionally, versions of these methods for `Comparable` types are also provided:

```swift
Expand All @@ -39,20 +60,26 @@ extension Collection where Element: Comparable {

public func max(count: Int) -> [Element]
}

extension Sequence where Element: Comparable {
public func minAndMax() -> (min: Element, max: Element)?
}
```

### Complexity

The algorithm used is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if *k* is a small amount. If *k* is a large amount (more than 10% of the collection), we fall back to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`.
The algorithm used for minimal- or maximal-ordered subsets is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if *k* is a small amount. If *k* is a large amount (more than 10% of the collection), we fall back to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`.

Here are some benchmarks we made that demonstrates how this implementation (SmallestM) behaves when *k* increases (before implementing the fallback):

![Benchmark](Resources/SortedPrefix/FewElements.png)
![Benchmark 2](Resources/SortedPrefix/ManyElements.png)

The algorithm used for simultaneous minimum and maximum is slightly optimized. At each iteration, two elements are read, their relative order is determined, then each is compared against exactly one of the current extrema for potential replacement. When a comparison predicate has to analyze every component of both operands, the optimized algorithm isn't much faster than the straightforward approach. But when a predicate only needs to compare a small part of each instance, the optimization shines through.

### Comparison with other languages

**C++:** The `<algorithm>` library defines a `partial_sort` function where the entire array is returned using a partial heap sort.
**C++:** The `<algorithm>` library defines a `partial_sort` function where the entire array is returned using a partial heap sort. It also defines a `minmax_element` function that scans a range for its minimal and maximal elements.

**Python:** Defines a `heapq` priority queue that can be used to manually achieve the same result.

1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Read more about the package, and the intent behind it, in the [announcement on s
- [`suffix(while:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Suffix.md): Returns the suffix of a collection where all element pass a given predicate.
- [`trimming(while:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Trim.md): Returns a slice by trimming elements from a collection's start and end.
- [`uniqued()`, `uniqued(on:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Unique.md): The unique elements of a collection, preserving their order.
- [`minAndMax()`, `minAndMax(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/MinMax.md): Returns the smallest and largest elements of a sequence.

#### Partial sorting

Expand Down
138 changes: 138 additions & 0 deletions Sources/Algorithms/MinMax.swift
Original file line number Diff line number Diff line change
Expand Up @@ -383,3 +383,141 @@ extension Collection where Element: Comparable {
return max(count: count, sortedBy: <)
}
}

//===----------------------------------------------------------------------===//
// Simultaneous minimum and maximum evaluation
//===----------------------------------------------------------------------===//

extension Sequence {
/// Returns both the minimum and maximum elements in the sequence, using the
/// given predicate as the comparison between elements.
///
/// The predicate must be a *strict weak ordering* over the elements. That
/// is, for any elements `a`, `b`, and `c`, the following conditions must
/// hold:
///
/// - `areInIncreasingOrder(a, a)` is always `false`. (Irreflexivity)
/// - If `areInIncreasingOrder(a, b)` and `areInIncreasingOrder(b, c)` are
/// both `true`, then `areInIncreasingOrder(a, c)` is also
/// `true`. (Transitive comparability)
/// - Two elements are *incomparable* if neither is ordered before the other
/// according to the predicate. If `a` and `b` are incomparable, and `b`
/// and `c` are incomparable, then `a` and `c` are also incomparable.
/// (Transitive incomparability)
///
/// This example shows how to use the `minAndMax(by:)` method on a
/// dictionary to find the key-value pair with the lowest value and the pair
/// with the highest value.
///
/// let hues = ["Heliotrope": 296, "Coral": 16, "Aquamarine": 156]
/// if let extremeHues = hues.minAndMax(by: {$0.value < $1.value}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Spaces between the curly braces.

/// print(extremeHues.min, extremeHues.max)
/// } else {
/// print("There are no hues")
/// }
/// // Prints: "(key: "Coral", value: 16) (key: "Heliotrope", value: 296)"
///
/// - Precondition: The sequence is finite.
///
/// - Parameter areInIncreasingOrder: A predicate that returns `true`
/// if its first argument should be ordered before its second
/// argument; otherwise, `false`.
/// - Returns: A tuple with the sequence's minimum element, followed by its
/// maximum element. For either member, if the sequence provides multiple
/// qualifying elements, the one chosen is unspecified. The same element may
Copy link
Member

@natecook1000 natecook1000 Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to take a more principled stand than this. Right now we have two different behaviors for which maximum element is chosen. First, sort(), max(count:), and Swift.max (not the sequence version) all either return the last equal value or, in the sort/min(count:) case, put it at the end.

/// All instances of `Foo` are equal to each other.
class Foo: Comparable {
    static func ==(lhs: Foo, rhs: Foo) -> Bool { true }
    static func < (lhs: Foo, rhs: Foo) -> Bool { false }
}

let a = Foo()
let b = Foo()
max(a, b) === b                  // true
[a, b].sorted().last === b       // true
[a, b].max(count: 1).first === b // true

Second, the sequence version of max() returns the first instance of the maximum value.

[a, b].max() === b               // false

On the min side, there’s only one behavior — all the equivalent methods return the first minimum element.

Even though we’ll end up where this method and the stdlib’s Sequence.max() return different instances, let’s align this new method with the majoritarian position, and I’ll keep looking at how we can switch Sequence.max() to come into alignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enshrining which element is chosen to break ties seems like an over-specification; someone's code may be buggy if they're depending on a specific equivalent element.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already specify this in e.g. Swift.max, and I'd like to make the sort() stability a guarantee, which is similar. The dependence is exactly why I'd like to specify the behavior — rather than someone depending on the result of an implementation quirk, they can depend on documented behavior, and we can test to make sure we meet the expectation.

/// be used for both members if all the elements are equivalent. If the
/// sequence has no elements, returns `nil`.
///
/// - Complexity: O(*n*), where *n* is the length of the sequence.
public func minAndMax(
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> (min: Element, max: Element)? {
// Check short sequences.
var iterator = makeIterator()
guard var lowest = iterator.next() else { return nil }
guard var highest = iterator.next() else { return (lowest, lowest) }

// Confirm the initial bounds.
if try areInIncreasingOrder(highest, lowest) { swap(&lowest, &highest) }

#if true
// Read the elements in pairwise. Structuring the comparisons around this
// is actually faster than loops based on extracting and testing elements
// one-at-a-time.
while var low = iterator.next() {
if var high = iterator.next() {
// Update the upper bound with the larger new element.
if try areInIncreasingOrder(high, low) { swap(&low, &high) }
if try !areInIncreasingOrder(high, highest) { highest = high }
} else {
// Update the upper bound by reusing the last element. The next element
// iteration will also fail, ending the loop.
if try !areInIncreasingOrder(low, highest) { highest = low }
}

// Update the lower bound with the smaller new element, which may need a
// swap first to determine.
if try areInIncreasingOrder(low, lowest) { lowest = low }
}
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

/// Ensure the second argument has a value that is ranked at least as much as
/// the first argument.
func sort(_ a: inout Element, _ b: inout Element) throws {
if try areInIncreasingOrder(b, a) { swap(&a, &b) }
}

/// Find the smallest and largest values out of a group of four arguments.
func minAndMaxOf4(
_ a: Element, _ b: Element, _ c: Element, _ d: Element
) throws -> (min: Element, max: Element) {
var (a, b, c, d) = (a, b, c, d)
try sort(&a, &b)
try sort(&c, &d)
try sort(&a, &c)
try sort(&b, &d)
return (a, d)
}

// Read the elements in four-at-a-time. Some say this is more effective
// than a two-at-a-time loop.
while let a = iterator.next() {
let b = iterator.next() ?? a
let c = iterator.next() ?? b
let d = iterator.next() ?? c
let (low, high) = try minAndMaxOf4(a, b, c, d)
if try areInIncreasingOrder(low, lowest) { lowest = low }
if try !areInIncreasingOrder(high, highest) { highest = high }
}
#endif
return (lowest, highest)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The while loop can be substantially simplified, since iterators are guaranteed to keep returning nil after they’ve done so once:

while var low = iterator.next() {
  var high = iterator.next() ?? low
  if try areInIncreasingOrder(high, low) { swap(&low, &high) }
  if try areInIncreasingOrder(low, lowest) { lowest = low }
  if try !areInIncreasingOrder(high, highest) { highest = high }
}

This does technically perform one extraneous comparison on odd-length sequences (high vs low after the ?? was triggered), but even that can be rectified by restoring the guard if benchmarks show a measurable difference:

while var low = iterator.next() {
  guard var high = iterator.next() else {
    if try areInIncreasingOrder(low, lowest) { lowest = low }
    if try !areInIncreasingOrder(low, highest) { highest = low }
    break
  }
  if try areInIncreasingOrder(high, low) { swap(&low, &high) }
  if try areInIncreasingOrder(low, lowest) { lowest = low }
  if try !areInIncreasingOrder(high, highest) { highest = high }
}

}

extension Sequence where Element: Comparable {
/// Returns both the minimum and maximum elements in the sequence.
///
/// This example finds the smallest and largest values in an array of height
/// measurements.
///
/// let heights = [67.5, 65.7, 64.3, 61.1, 58.5, 60.3, 64.9]
/// if let (lowestHeight, greatestHeight) = heights.minAndMax() {
/// print(lowestHeight, greatestHeight)
/// } else {
/// print("The list of heights is empty")
/// }
/// // Prints: "58.5 67.5"
///
/// - Precondition: The sequence is finite.
///
/// - Returns: A tuple with the sequence's minimum element, followed by its
/// maximum element. For either member, if there is a tie for the extreme
/// value, the element chosen is unspecified. The same element may be used
/// for both members if all the elements are equal. If the sequence has no
/// elements, returns `nil`.
///
/// - Complexity: O(*n*), where *n* is the length of the sequence.
@inlinable
public func minAndMax() -> (min: Element, max: Element)? {
return minAndMax(by: <)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We don't need a return here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the return.

As a matter of style, I think return should only be omitted when the entire{...} block is on a single line.

Copy link
Contributor

@LemonSpike LemonSpike Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we had a style guide for these things. But you're right to be consistent with the whole codebase 😊 (I think). Looks like SE-0250 is not coming back soon, but I could raise it if it does.

}
}
64 changes: 64 additions & 0 deletions Tests/SwiftAlgorithmsTests/MinMaxTests.swift
Original file line number Diff line number Diff line change
Expand Up @@ -185,3 +185,67 @@ final class SortedPrefixTests: XCTestCase {
}
}
}

final class MinAndMaxTests: XCTestCase {
/// Confirms that empty sequences yield no results.
func testEmpty() {
XCTAssertNil(EmptyCollection<Int>().minAndMax())
}

/// Confirms the same element is used when there is only one.
func testSingleElement() {
let result = CollectionOfOne(2).minAndMax()
XCTAssertEqual(result?.min, 2)
XCTAssertEqual(result?.max, 2)
}

/// Confirms the same value is used when all the elements have it.
func testSingleValueMultipleElements() {
let result = repeatElement(3.3, count: 5).minAndMax()
XCTAssertEqual(result?.min, 3.3)
XCTAssertEqual(result?.max, 3.3)

// Even count
let result2 = repeatElement("c" as Character, count: 6).minAndMax()
XCTAssertEqual(result2?.min, "c")
XCTAssertEqual(result2?.max, "c")
}

/// Confirms when the minimum value is constantly updated, but the maximum
/// never is.
func testRampDown() {
let result = (1...5).reversed().minAndMax()
XCTAssertEqual(result?.min, 1)
XCTAssertEqual(result?.max, 5)

// Even count
let result2 = "fedcba".minAndMax()
XCTAssertEqual(result2?.min, "a")
XCTAssertEqual(result2?.max, "f")
}

/// Confirms when the maximum value is constantly updated, but the minimum
/// never is.
func testRampUp() {
let result = (1...5).minAndMax()
XCTAssertEqual(result?.min, 1)
XCTAssertEqual(result?.max, 5)

// Even count
let result2 = "abcdef".minAndMax()
XCTAssertEqual(result2?.min, "a")
XCTAssertEqual(result2?.max, "f")
}

/// Confirms when the maximum and minimum change during a run.
func testUpsAndDowns() {
let result = [4, 3, 3, 5, 2, 0, 7, 6].minAndMax()
XCTAssertEqual(result?.min, 0)
XCTAssertEqual(result?.max, 7)

// Odd count
let result2 = "gfabdec".minAndMax()
XCTAssertEqual(result2?.min, "a")
XCTAssertEqual(result2?.max, "g")
}
}