Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "sortedPrefix(_:by)" to Collection #9

Merged
merged 31 commits into from
Dec 4, 2020
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
5429d3b
Add partial sort algorithm
rakaramos Oct 8, 2020
4362197
Add in place partial sorting
rockbruno Oct 9, 2020
f299df1
Guide docs
rockbruno Oct 9, 2020
6cd2870
Use Indexes
rockbruno Oct 9, 2020
63b2dd0
Merge pull request #1 from rakaramos/guide
rakaramos Oct 9, 2020
88216e1
Add partial sort tests
rakaramos Oct 9, 2020
afe7111
Indent up to 80 columns
rakaramos Oct 9, 2020
4652ae7
Fix heapify stopping before it should
rockbruno Oct 9, 2020
37d494a
Update PartialSort.md
rockbruno Oct 9, 2020
83d5f1e
Update PartialSort.md
rockbruno Oct 9, 2020
bf31ba1
Update PartialSort.swift
rockbruno Oct 9, 2020
acb3583
Cleaning up iterators logic
rockbruno Oct 9, 2020
6227bd8
Update PartialSort.swift
rockbruno Oct 9, 2020
d4a2e6b
Cleaning docs
rockbruno Oct 9, 2020
62ee6f2
Change implementation and name
rakaramos Oct 21, 2020
f674851
DocDocs
rockbruno Oct 21, 2020
5bdea96
Merge remote-tracking branch 'origin/fix-algo' into docdocs
rockbruno Oct 21, 2020
dd15b5a
Docs
rockbruno Oct 21, 2020
7ac3915
Merge pull request #3 from rakaramos/docdocs
rockbruno Oct 21, 2020
c68537f
Docs
rockbruno Oct 21, 2020
e8504fd
Optimize
rockbruno Oct 21, 2020
36e9a39
Fix header and remove assert
rakaramos Oct 28, 2020
1d22ef9
Add more tests (#4)
rakaramos Oct 31, 2020
62096e1
Update PartialSortTests.swift
rockbruno Oct 31, 2020
d0c1ccd
Merge pull request #5 from rakaramos/rockbruno-patch-1
rockbruno Oct 31, 2020
23bf863
Update Sources/Algorithms/PartialSort.swift
rockbruno Nov 1, 2020
379609b
Update Sources/Algorithms/PartialSort.swift
rockbruno Nov 1, 2020
435a38c
Update Sources/Algorithms/PartialSort.swift
rockbruno Nov 1, 2020
70973a2
Documentation fixes
rockbruno Nov 1, 2020
70a263c
Add tests for massive inputs
rockbruno Dec 2, 2020
1d3dcaf
isLastElement
rockbruno Dec 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions Guides/PartialSort.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Partial Sort (sortedPrefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to rename all documents with consistent terminology:

Suggested change
# Partial Sort (sortedPrefix)
# Sorted Prefix


[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/PartialSort.swift) |
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/PartialSortTests.swift)]

Returns the first k elements of this collection when it's sorted.

If you need to sort a collection but only need access to a prefix of its
elements, using this method can give you a performance boost over sorting
the entire collection. The order of equal elements is guaranteed to be
preserved.

```swift
let numbers = [7,1,6,2,8,3,9]
let smallestThree = numbers.sortedPrefix(<)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a correct invocation of the API provided here.

// [1, 2, 3]
```

## Detailed Design

This adds the `Collection` method shown below:

```swift
extension Collection {
public func sortedPrefix(_ count: Int, by areInIncreasingOrder: (Element, Element) throws -> Bool) rethrows -> [Element]
}
```

Additionally, a version of this method for `Comparable` types is also provided:

```swift
extension Collection where Element: Comparable {
public func sortedPrefix(_ count: Int) -> [Element]
}
```

### Complexity

The algorithm used is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if k is a small amount. If k is a large amount (more than 10% of the collection), we fallback to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The algorithm used is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if k is a small amount. If k is a large amount (more than 10% of the collection), we fallback to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`.
The algorithm used is based on [Soroush Khanlou's research on this matter](https://khanlou.com/2018/12/analyzing-complexity/). The total complexity is `O(k log k + nk)`, which will result in a runtime close to `O(n)` if k is a small amount. If k is a large amount (more than 10% of the collection), we fall back to sorting the entire array. Realistically, this means the worst case is actually `O(n log n)`.

I'm not sure how the last statement is arrived at. Could you explain?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we reach a point where O(k log k + nk) is going to be worse than sorting the full array we fall back to stdlib's O(n log n) sort, so in practice it shouldn't get much worse than that.


Here are some benchmarks we made that demonstrates how this implementation (SmallestM) behaves when k increases (before implementing the fallback):

![Benchmark](https://i.imgur.com/F5UEQnl.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we embed these into this project, as opposed to arbitrary external URLs?

![Benchmark 2](https://i.imgur.com/Bm9DKRc.png)

### Comparison with other languages

**C++:** The `<algorithm>` library defines a `partial_sort` function where the entire array is returned using a partial heap sort.

**Python:** Defines a `heapq` priority queue that can be used to manually
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it'd be nice if this document were either consistently wrapped to 80 columns, or else not wrapped. It seems there are two styles here.

achieve the same result.

4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ Read more about the package, and the intent behind it, in the [announcement on s
- [`randomStableSample(count:)`, `randomStableSample(count:using:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/RandomSampling.md): Randomly selects a specific number of elements from a collection, preserving their original relative order.
- [`uniqued()`, `uniqued(on:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Unique.md): The unique elements of a collection, preserving their order.

#### Partial sorting

- [`sortedPrefix(_:by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/PartialSort.md): Returns the first k elements of a sorted collection.

#### Other useful operations

- [`chunked(by:)`, `chunked(on:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Chunked.md): Eager and lazy operations that break a collection into chunks based on either a binary predicate or when the result of a projection changes.
Expand Down
87 changes: 87 additions & 0 deletions Sources/Algorithms/PartialSort.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
//===----------------------------------------------------------------------===//
//
// This source file is part of the Swift Algorithms open source project
//
// Copyright (c) 2020 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
//
rakaramos marked this conversation as resolved.
Show resolved Hide resolved
//===----------------------------------------------------------------------===//

extension Collection {
/// Returns the first k elements of this collection when it's sorted using
/// the given predicate as the comparison between elements.
///
/// This example partially sorts an array of integers to retrieve its three
/// smallest values:
///
/// let numbers = [7,1,6,2,8,3,9]
/// let smallestThree = numbers.sortedPrefix(3, <)
/// // [1, 2, 3]
///
/// If you need to sort a collection but only need access to a prefix of its
/// elements, using this method can give you a performance boost over sorting
/// the entire collection. The order of equal elements is guaranteed to be
/// preserved.
///
/// - Parameter count: The k number of elements to prefix.
/// - Parameter areInIncreasingOrder: A predicate that returns true if its
/// first argument should be ordered before its second argument;
/// otherwise, false.
///
/// - Complexity: O(k log k + nk)
public func sortedPrefix(
_ count: Int,
by areInIncreasingOrder: (Element, Element) throws -> Bool
) rethrows -> [Self.Element] {
assert(count >= 0, """
Cannot prefix with a negative amount of elements!
"""
)

// Make sure we are within bounds
rockbruno marked this conversation as resolved.
Show resolved Hide resolved
let prefixCount = Swift.min(count, self.count)

// If we're attempting to prefix more than 10% of the collection, it's faster to sort everything.
rockbruno marked this conversation as resolved.
Show resolved Hide resolved
guard prefixCount < (self.count / 10) else {
return Array(try sorted(by: areInIncreasingOrder).prefix(prefixCount))
}

var result = try self.prefix(prefixCount).sorted(by: areInIncreasingOrder)
for e in self.dropFirst(prefixCount) {
if let last = result.last, try areInIncreasingOrder(last, e) {
continue
}
let insertionIndex = try result.partitioningIndex { try areInIncreasingOrder(e, $0) }
rockbruno marked this conversation as resolved.
Show resolved Hide resolved
result.removeLast()
result.insert(e, at: insertionIndex)
}

return result
}
}

extension Collection where Element: Comparable {
/// Returns the first k elements of this collection when it's sorted using
/// the given predicate as the comparison between elements.
///
/// This example partially sorts an array of integers to retrieve its three
/// smallest values:
///
/// let numbers = [7,1,6,2,8,3,9]
/// let smallestThree = numbers.sortedPrefix(3, <)
/// // [1, 2, 3]
///
/// If you need to sort a collection but only need access to a prefix of its
/// elements, using this method can give you a performance boost over sorting
/// the entire collection. The order of equal elements is guaranteed to be
/// preserved.
///
/// - Parameter count: The k number of elements to prefix.
///
/// - Complexity: O(k log k + nk)
public func sortedPrefix(_ count: Int) -> [Element] {
return sortedPrefix(count, by: <)
}
}
146 changes: 146 additions & 0 deletions Tests/SwiftAlgorithmsTests/PartialSortTests.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
//===----------------------------------------------------------------------===//
//
// This source file is part of the Swift Algorithms open source project
//
// Copyright (c) 2020 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
//
//===----------------------------------------------------------------------===//

import XCTest
import Algorithms

final class PartialSortTests: XCTestCase {
func testEmpty() {
let array = [Int]()
XCTAssertEqual(array.sortedPrefix(0), [])
}

func testSortedPrefixWithOrdering() {
let array: [Int] = [20, 1, 4, 70, 100, 2, 3, 7, 90]

XCTAssertEqual(array.sortedPrefix(0, by: >), [])
XCTAssertEqual(
array.sortedPrefix(1, by: >),
[100]
)

XCTAssertEqual(
array.sortedPrefix(5, by: >),
[100, 90, 70, 20, 7]
)

XCTAssertEqual(
array.sortedPrefix(9, by: >),
[100, 90, 70, 20, 7, 4, 3, 2, 1]
)

XCTAssertEqual([1].sortedPrefix(0, by: <), [])
XCTAssertEqual([1].sortedPrefix(0, by: >), [])
XCTAssertEqual([1].sortedPrefix(1, by: <), [1])
XCTAssertEqual([1].sortedPrefix(1, by: >), [1])
XCTAssertEqual([0, 1].sortedPrefix(1, by: <), [0])
XCTAssertEqual([1, 0].sortedPrefix(1, by: <), [0])
XCTAssertEqual([1, 0].sortedPrefix(2, by: <), [0, 1])
XCTAssertEqual([0, 1].sortedPrefix(1, by: >), [1])
XCTAssertEqual([1, 0].sortedPrefix(1, by: >), [1])
XCTAssertEqual([1, 0].sortedPrefix(2, by: >), [1, 0])

XCTAssertEqual(
[1, 2, 3, 4, 7, 20, 70, 90, 100].sortedPrefix(5, by: <),
[1, 2, 3, 4, 7]
)

XCTAssertEqual(
[1, 2, 3, 4, 7, 20, 70, 90, 100].sortedPrefix(5, by: >),
[100, 90, 70, 20, 7]
)

XCTAssertEqual(
[1, 2, 3, 4, 7, 20, 70, 90, 100].sortedPrefix(5, by: >),
[100, 90, 70, 20, 7]
)

XCTAssertEqual(
[1, 2, 3, 4, 7, 20, 70, 90, 100].sortedPrefix(5, by: <),
[1, 2, 3, 4, 7]
)

XCTAssertEqual(
[4, 5, 6, 1, 2, 3].sortedPrefix(3, by: <),
[1, 2, 3]
)

XCTAssertEqual(
[4, 5, 9, 8, 7, 6].sortedPrefix(3, by: <),
[4, 5, 6]
)

XCTAssertEqual(
[4, 3, 2, 1].sortedPrefix(1, by: <),
[1]
)

XCTAssertEqual(
[4, 2, 1, 3].sortedPrefix(3, by: >),
[4, 3, 2]
)

XCTAssertEqual(
[4, 2, 1, 3].sortedPrefix(3, by: <),
[1, 2, 3]
)
}

func testSortedPrefixComparable() {
let array: [Int] = [20, 1, 4, 70, 100, 2, 3, 7, 90]

XCTAssertEqual(array.sortedPrefix(0), [])

XCTAssertEqual(
array.sortedPrefix(1),
[1]
)

XCTAssertEqual(
array.sortedPrefix(5),
[1, 2, 3, 4, 7]
)

XCTAssertEqual(
array.sortedPrefix(9),
[1, 2, 3, 4, 7, 20, 70, 90, 100]
)
}

func testSortedPrefixWithHugePrefix() {
XCTAssertEqual(
[4, 2, 1, 3].sortedPrefix(.max),
[1, 2, 3, 4]
)
}

func testStability() {
assertStability([1,1,1,2,5,7,3,6,2,5,7,3,6], withPrefix: 3)
assertStability([1,1,1,2,5,7,3,6,2,5,7,3,6], withPrefix: 6)
assertStability([1,1,1,2,5,7,3,6,2,5,7,3,6], withPrefix: 20)
assertStability([1,1,1,2,5,7,3,6,2,5,7,3,6], withPrefix: 1000)
}

func assertStability(
_ actual: [Int],
withPrefix prefixCount: Int,
file: StaticString = #file,
line: UInt = #line
) {
let indexed = actual.enumerated()
let sorted = indexed.map { $0 } .sortedPrefix(prefixCount) { $0.element < $1.element }

for element in Set(actual) {
let filtered = sorted.filter { $0.element == element }.map(\.offset)
XCTAssertEqual(filtered, filtered.sorted())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}

}
}