-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "sortedPrefix(_:by)" to Collection #9
Changes from 23 commits
5429d3b
4362197
f299df1
6cd2870
63b2dd0
88216e1
afe7111
4652ae7
37d494a
83d5f1e
bf31ba1
acb3583
6227bd8
d4a2e6b
62ee6f2
f674851
5bdea96
dd15b5a
7ac3915
c68537f
e8504fd
36e9a39
1d22ef9
62096e1
d0c1ccd
23bf863
379609b
435a38c
70973a2
70a263c
1d3dcaf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,52 @@ | ||||||
# Partial Sort (sortedPrefix) | ||||||
|
||||||
[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/PartialSort.swift) | | ||||||
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/PartialSortTests.swift)] | ||||||
|
||||||
Returns the first k elements of this collection when it's sorted. | ||||||
|
||||||
If you need to sort a collection but only need access to a prefix of its | ||||||
elements, using this method can give you a performance boost over sorting | ||||||
the entire collection. The order of equal elements is guaranteed to be | ||||||
preserved. | ||||||
|
||||||
```swift | ||||||
let numbers = [7,1,6,2,8,3,9] | ||||||
let smallestThree = numbers.sortedPrefix(<) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't a correct invocation of the API provided here. |
||||||
// [1, 2, 3] | ||||||
``` | ||||||
|
||||||
## Detailed Design | ||||||
|
||||||
This adds the `Collection` method shown below: | ||||||
|
||||||
```swift | ||||||
extension Collection { | ||||||
public func sortedPrefix(_ count: Int, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> [Element] | ||||||
} | ||||||
``` | ||||||
|
||||||
Additionally, a version of this method for `Comparable` types is also provided: | ||||||
|
||||||
```swift | ||||||
extension Collection where Element: Comparable { | ||||||
public func sortedPrefix(_ count: Int) -> [Element] | ||||||
} | ||||||
``` | ||||||
|
||||||
### Complexity | ||||||
|
||||||
The algorithm used is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if k is a small amount. If k is a large amount (more than 10% of the collection), we fallback to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I'm not sure how the last statement is arrived at. Could you explain? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we reach a point where |
||||||
|
||||||
Here are some benchmarks we made that demonstrates how this implementation (SmallestM) behaves when k increases (before implementing the fallback): | ||||||
|
||||||
![Benchmark](https://i.imgur.com/F5UEQnl.png) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we embed these into this project, as opposed to arbitrary external URLs? |
||||||
![Benchmark 2](https://i.imgur.com/Bm9DKRc.png) | ||||||
|
||||||
### Comparison with other languages | ||||||
|
||||||
**C++:** The `<algorithm>` library defines a `partial_sort` function where the entire array is returned using a partial heap sort. | ||||||
|
||||||
**Python:** Defines a `heapq` priority queue that can be used to manually | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: it'd be nice if this document were either consistently wrapped to 80 columns, or else not wrapped. It seems there are two styles here. |
||||||
achieve the same result. | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// This source file is part of the Swift Algorithms open source project | ||
// | ||
// Copyright (c) 2020 Apple Inc. and the Swift project authors | ||
// Licensed under Apache License v2.0 with Runtime Library Exception | ||
// | ||
// See https://swift.org/LICENSE.txt for license information | ||
// | ||
rakaramos marked this conversation as resolved.
Show resolved
Hide resolved
|
||
//===----------------------------------------------------------------------===// | ||
|
||
extension Collection { | ||
/// Returns the first k elements of this collection when it's sorted using | ||
/// the given predicate as the comparison between elements. | ||
/// | ||
/// This example partially sorts an array of integers to retrieve its three | ||
/// smallest values: | ||
/// | ||
/// let numbers = [7,1,6,2,8,3,9] | ||
/// let smallestThree = numbers.sortedPrefix(3, <) | ||
/// // [1, 2, 3] | ||
/// | ||
/// If you need to sort a collection but only need access to a prefix of its | ||
/// elements, using this method can give you a performance boost over sorting | ||
/// the entire collection. The order of equal elements is guaranteed to be | ||
/// preserved. | ||
/// | ||
/// - Parameter count: The k number of elements to prefix. | ||
/// - Parameter areInIncreasingOrder: A predicate that returns true if its | ||
/// first argument should be ordered before its second argument; | ||
/// otherwise, false. | ||
/// | ||
/// - Complexity: O(k log k + nk) | ||
public func sortedPrefix( | ||
_ count: Int, | ||
by areInIncreasingOrder: (Element, Element) throws -> Bool | ||
) rethrows -> [Self.Element] { | ||
assert(count >= 0, """ | ||
Cannot prefix with a negative amount of elements! | ||
""" | ||
) | ||
|
||
// Make sure we are within bounds | ||
rockbruno marked this conversation as resolved.
Show resolved
Hide resolved
|
||
let prefixCount = Swift.min(count, self.count) | ||
|
||
// If we're attempting to prefix more than 10% of the collection, it's faster to sort everything. | ||
rockbruno marked this conversation as resolved.
Show resolved
Hide resolved
|
||
guard prefixCount < (self.count / 10) else { | ||
return Array(try sorted(by: areInIncreasingOrder).prefix(prefixCount)) | ||
} | ||
|
||
var result = try self.prefix(prefixCount).sorted(by: areInIncreasingOrder) | ||
for e in self.dropFirst(prefixCount) { | ||
if let last = result.last, try areInIncreasingOrder(last, e) { | ||
continue | ||
} | ||
let insertionIndex = try result.partitioningIndex { try areInIncreasingOrder(e, $0) } | ||
rockbruno marked this conversation as resolved.
Show resolved
Hide resolved
|
||
result.removeLast() | ||
result.insert(e, at: insertionIndex) | ||
} | ||
|
||
return result | ||
} | ||
} | ||
|
||
extension Collection where Element: Comparable { | ||
/// Returns the first k elements of this collection when it's sorted using | ||
/// the given predicate as the comparison between elements. | ||
/// | ||
/// This example partially sorts an array of integers to retrieve its three | ||
/// smallest values: | ||
/// | ||
/// let numbers = [7,1,6,2,8,3,9] | ||
/// let smallestThree = numbers.sortedPrefix(3, <) | ||
/// // [1, 2, 3] | ||
/// | ||
/// If you need to sort a collection but only need access to a prefix of its | ||
/// elements, using this method can give you a performance boost over sorting | ||
/// the entire collection. The order of equal elements is guaranteed to be | ||
/// preserved. | ||
/// | ||
/// - Parameter count: The k number of elements to prefix. | ||
/// | ||
/// - Complexity: O(k log k + nk) | ||
public func sortedPrefix(_ count: Int) -> [Element] { | ||
return sortedPrefix(count, by: <) | ||
} | ||
} |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,146 @@ | ||||||
//===----------------------------------------------------------------------===// | ||||||
// | ||||||
// This source file is part of the Swift Algorithms open source project | ||||||
// | ||||||
// Copyright (c) 2020 Apple Inc. and the Swift project authors | ||||||
// Licensed under Apache License v2.0 with Runtime Library Exception | ||||||
// | ||||||
// See https://swift.org/LICENSE.txt for license information | ||||||
// | ||||||
//===----------------------------------------------------------------------===// | ||||||
|
||||||
import XCTest | ||||||
import Algorithms | ||||||
|
||||||
final class PartialSortTests: XCTestCase { | ||||||
func testEmpty() { | ||||||
let array = [Int]() | ||||||
XCTAssertEqual(array.sortedPrefix(0), []) | ||||||
} | ||||||
|
||||||
func testSortedPrefixWithOrdering() { | ||||||
let array: [Int] = [20, 1, 4, 70, 100, 2, 3, 7, 90] | ||||||
|
||||||
XCTAssertEqual(array.sortedPrefix(0, by: >), []) | ||||||
XCTAssertEqual( | ||||||
array.sortedPrefix(1, by: >), | ||||||
[100] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
array.sortedPrefix(5, by: >), | ||||||
[100, 90, 70, 20, 7] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
array.sortedPrefix(9, by: >), | ||||||
[100, 90, 70, 20, 7, 4, 3, 2, 1] | ||||||
) | ||||||
|
||||||
XCTAssertEqual([1].sortedPrefix(0, by: <), []) | ||||||
XCTAssertEqual([1].sortedPrefix(0, by: >), []) | ||||||
XCTAssertEqual([1].sortedPrefix(1, by: <), [1]) | ||||||
XCTAssertEqual([1].sortedPrefix(1, by: >), [1]) | ||||||
XCTAssertEqual([0, 1].sortedPrefix(1, by: <), [0]) | ||||||
XCTAssertEqual([1, 0].sortedPrefix(1, by: <), [0]) | ||||||
XCTAssertEqual([1, 0].sortedPrefix(2, by: <), [0, 1]) | ||||||
XCTAssertEqual([0, 1].sortedPrefix(1, by: >), [1]) | ||||||
XCTAssertEqual([1, 0].sortedPrefix(1, by: >), [1]) | ||||||
XCTAssertEqual([1, 0].sortedPrefix(2, by: >), [1, 0]) | ||||||
|
||||||
XCTAssertEqual( | ||||||
[1, 2, 3, 4, 7, 20, 70, 90, 100].sortedPrefix(5, by: <), | ||||||
[1, 2, 3, 4, 7] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
[1, 2, 3, 4, 7, 20, 70, 90, 100].sortedPrefix(5, by: >), | ||||||
[100, 90, 70, 20, 7] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
[1, 2, 3, 4, 7, 20, 70, 90, 100].sortedPrefix(5, by: >), | ||||||
[100, 90, 70, 20, 7] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
[1, 2, 3, 4, 7, 20, 70, 90, 100].sortedPrefix(5, by: <), | ||||||
[1, 2, 3, 4, 7] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
[4, 5, 6, 1, 2, 3].sortedPrefix(3, by: <), | ||||||
[1, 2, 3] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
[4, 5, 9, 8, 7, 6].sortedPrefix(3, by: <), | ||||||
[4, 5, 6] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
[4, 3, 2, 1].sortedPrefix(1, by: <), | ||||||
[1] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
[4, 2, 1, 3].sortedPrefix(3, by: >), | ||||||
[4, 3, 2] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
[4, 2, 1, 3].sortedPrefix(3, by: <), | ||||||
[1, 2, 3] | ||||||
) | ||||||
} | ||||||
|
||||||
func testSortedPrefixComparable() { | ||||||
let array: [Int] = [20, 1, 4, 70, 100, 2, 3, 7, 90] | ||||||
|
||||||
XCTAssertEqual(array.sortedPrefix(0), []) | ||||||
|
||||||
XCTAssertEqual( | ||||||
array.sortedPrefix(1), | ||||||
[1] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
array.sortedPrefix(5), | ||||||
[1, 2, 3, 4, 7] | ||||||
) | ||||||
|
||||||
XCTAssertEqual( | ||||||
array.sortedPrefix(9), | ||||||
[1, 2, 3, 4, 7, 20, 70, 90, 100] | ||||||
) | ||||||
} | ||||||
|
||||||
func testSortedPrefixWithHugePrefix() { | ||||||
XCTAssertEqual( | ||||||
[4, 2, 1, 3].sortedPrefix(.max), | ||||||
[1, 2, 3, 4] | ||||||
) | ||||||
} | ||||||
|
||||||
func testStability() { | ||||||
assertStability([1,1,1,2,5,7,3,6,2,5,7,3,6], withPrefix: 3) | ||||||
assertStability([1,1,1,2,5,7,3,6,2,5,7,3,6], withPrefix: 6) | ||||||
assertStability([1,1,1,2,5,7,3,6,2,5,7,3,6], withPrefix: 20) | ||||||
assertStability([1,1,1,2,5,7,3,6,2,5,7,3,6], withPrefix: 1000) | ||||||
} | ||||||
|
||||||
func assertStability( | ||||||
_ actual: [Int], | ||||||
withPrefix prefixCount: Int, | ||||||
file: StaticString = #file, | ||||||
line: UInt = #line | ||||||
) { | ||||||
let indexed = actual.enumerated() | ||||||
let sorted = indexed.map { $0 } .sortedPrefix(prefixCount) { $0.element < $1.element } | ||||||
|
||||||
for element in Set(actual) { | ||||||
let filtered = sorted.filter { $0.element == element }.map(\.offset) | ||||||
XCTAssertEqual(filtered, filtered.sorted()) | ||||||
} | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
} | ||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to rename all documents with consistent terminology: