-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Chunked] Add chunks(of:)
variant that divides a collection in chunks of a given size
#54
Conversation
e8e4e43
to
02808ee
Compare
I think this would make a great addition! What do you think about the name let x = [1, 2, 2, 3, 3, 3, 2]
print(x.chunks(of: 2))
// [[2, 2], [2]] or [[1, 2], [2, 3], [3, 3], [2]]? But rather the chunk size: let x = [1, 2, 2, 3, 3, 3, 2]
print(x.chunks(ofCount: 2))
// [[1, 2], [2, 3], [3, 3], [2]] |
Hey @kylemacomber :)
I agree, that is a good idea and will remove any ambiguity as far as interpreting the function... I'll make the changes :) |
All fixed Kyle =] |
99d39f3
to
91f1e73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see this PR, @LucianoPAlmeida! I’ve done an initial review pass over the implementation — let me know if you have any questions.
Sources/Algorithms/Chunked.swift
Outdated
|
||
extension ChunkedCollection: Collection { | ||
public struct Index { | ||
public let base: Base.Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be internal.
There’s a bit of decision to make here between using this single index and using the range for the chunk. For a non-random-access collection, a single index makes using the subscript O(n), while a range moves that calculation to startIndex
. Given that startIndex
is so frequently touched for testing boundary conditions, the current implementation is probably the right one, but I don’t like that iterating over a collection this way requires traversing the base collection twice (once for each subscript access and once for each call to index(after:)
).
Another approach would be to calculate startIndex
at initialization and store it along with the chunk size. That way both startIndex
and subscripting would always be O(1). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC you're suggesting we store a Range in the index and calculate it in init
, which would allow both startIndex
and subscript
to be O(1)? I like that idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach would be to calculate
startIndex
at initialization and store it along with the chunk size. That way bothstartIndex
and subscripting would always be O(1). What do you think?
I've experimented with this approach when putting up this PR, I like the idea ... but there are two points:
- Both start index and endIndex(lower bound) will have to be computed.
- If we compute at initialization(start and end indexes) that the call then
[1,2,3].chunks(ofCount: 2)
becomes O(n) in non-RandomAccessCollection
base.
Edit: In fact end index don't need to be computed, so this should be fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just changed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After implemented, I would say that I definitely like this approach!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a point here... also I was thinking about this and this wouldn't be a big problem if we just compute the startIndex
every time, as imagine most cases people base would likely to be an array or an lazy collection from a array source with RandomAccessCollection
conditional conformance where base is random access so, in most cases startIndex
wouldn't be expensive to compute. Also, I still like this range based approach because as Kyle mentioned is more robust and iterations are improved.
The actual solution would be changing the language so general notifications are made when a reference object is mutated (so in this case, the cached startIndex can be recomputed), but that’s severe and I’m not sure it’s implementable.
In C# we have a concept of ObservableCollection
where mutations(well defined though an API) can be observed from anywhere. This not guarantee a notification of a custom collection mutation through a custom API, but I just remembered the concept, which is nice and very useful in the context of .NET framework =]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a reference-type collection or a collection of reference types, if there is a mutation between when a filter-like adapter is created and when it's traversed, I agree the result could be surprising and it's worth thinking carefully about our policy.
Note that for chunks(ofCount:)
, mutations to reference type elements in a collection shouldn't change the behavior, and mutations to a reference type collection itself, possibly could be made to work. Though depending on our policy, we may decide we don't need to be defensive here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More concretely, my only concern with this approach was that certain lazy collections (most notably LazyFilterCollection
) will run closures and evaluate elements when an index is advanced, and so precomputing the range of the first chunk can possibly run code that the user may not expect to be run at that stage. (Counterpoint: while lazy sequences are useful for short-circuiting, the user is very likely to at least evaluate the first element, in which case precomputing the start index doesn't have much of an impact.)
Calling .prefix(n)
on a lazy collection has the same behavior, so we might have to conclude that such things are unavoidable/acceptable 🙂 In this case I think we should consider precomputing the first chunk of chunked(on:)
and chunked(by:)
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling
.prefix(n)
on a lazy collection has the same behavior, so we might have to conclude that such things are unavoidable/acceptable 🙂 In this case I think we should consider precomputing the first chunk ofchunked(on:)
andchunked(by:)
as well.
I agree, and I think at the end of the day as @kylemacomber mentioned it all depends if we decide to be really strict and defensive or trade that of for performance :)
Also, I still don't think that it would be a big problem to always compute the start index if we really want to be conservative. In either way let me know what you guys think and I'll be happy to make the changes if necessary =]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual solution would be changing the language so general notifications are made when a reference object is mutated (so in this case, the cached
startIndex
can be recomputed), but that’s severe and I’m not sure it’s implementable.
Another solution would be limiting the Base
to be a value type. I think this was one of my inspirations to request adding an AnyValue
poison protocol.
Thanks for the review @natecook1000! |
489a2de
to
13d9171
Compare
13d9171
to
3092eb1
Compare
2d18684
to
1f71978
Compare
753e0e9
to
741d559
Compare
741d559
to
79aeb24
Compare
60fb045
to
01ed6d9
Compare
01ed6d9
to
05df988
Compare
Looking at the current code, it should be easy to set the |
8fe1c9d
to
45dcbbd
Compare
45dcbbd
to
f89fe8f
Compare
Tried doing turning I used a Git(Hub) client that pulled directly from this PR. I can't upload back to it, of course. So I have no idea to express my changes as proper patches to your work. Worse, before I tried to upload, you pushed down one of your own. (I started an hour before you did.) Here's a copy/paste of that whole section "Chunked.swift": //===----------------------------------------------------------------------===//
// chunks(ofCount:)
//===----------------------------------------------------------------------===//
/// An index for an element (or past-the-end) of a `ChunkedByCount` collection.
public struct ChunkedByCountIndex<Base: Comparable> {
// Using a separate type, instead of one embedded in ChunkedByCount, allows
// ChunkedByCount<T> and ChunkedByCount<T.SubSequence> to share indices.
@usableFromInline
internal let baseRange: Range<Base>
@usableFromInline
internal init(_baseRange: Range<Base>) {
self.baseRange = _baseRange
}
}
extension ChunkedByCountIndex: Equatable {}
extension ChunkedByCountIndex: Comparable {
public static func < (lhs: Self, rhs: Self) -> Bool {
// Intervals are harder to do straight comparisons for. But any cross-
// overlap between non-identical ranges means the operands weren't from
// compatible collections to begin with, a precondition violation.
//
// (Similar reasoning applies for not bothering to do a customized defintion
// for the equality operator.)
return lhs.baseRange.lowerBound < rhs.baseRange.lowerBound
}
}
extension ChunkedByCountIndex: Hashable where Base: Hashable {}
/// A collection that presents the elements of its base collection
/// in `SubSequence` chunks of any given count.
///
/// A `ChunkedByCount` is a lazy view on the base Collection, but it does not implicitly confer
/// laziness on algorithms applied to its result. In other words, for ordinary collections `c`:
///
/// * `c.chunks(ofCount: 3)` does not create new storage
/// * `c.chunks(ofCount: 3).map(f)` maps eagerly and returns a new array
/// * `c.lazy.chunks(ofCount: 3).map(f)` maps lazily and returns a `LazyMapCollection`
public struct ChunkedByCount<Base: Collection> {
public typealias Element = Base.SubSequence
@usableFromInline
internal let base: Base
@usableFromInline
internal let chunkCount: Int
@usableFromInline
internal var startUpperBound: Base.Index
/// Creates a view instance that presents the elements of the given collection
/// as `SubSequence` chunks with the given count, caching the given index for
/// where the first chunk ends.
///
/// - Complexity: O(1)
@inlinable
internal init(_base: Base, _chunkCount: Int, _firstChunkEnd: Base.Index) {
self.base = _base
self.chunkCount = _chunkCount
self.startUpperBound = _firstChunkEnd
}
/// Creates a view instance that presents the elements of `base`
/// in `SubSequence` chunks of the given count.
///
/// - Complexity: O(n)
@inlinable
internal init(_base: Base, _chunkCount: Int) {
// Compute the start index upfront in order to make
// start index a O(1) lookup.
let baseEnd = _base.index(_base.startIndex, offsetBy: _chunkCount,
limitedBy: _base.endIndex) ?? _base.endIndex
self.init(_base: _base, _chunkCount: _chunkCount, _firstChunkEnd: baseEnd)
}
}
extension ChunkedByCount: Collection {
public typealias Index = ChunkedByCountIndex<Base.Index>
public typealias SubSequence = ChunkedByCount<Base.SubSequence>
/// - Complexity: O(1)
@inlinable
public var startIndex: Index {
Index(_baseRange: base.startIndex..<startUpperBound)
}
@inlinable
public var endIndex: Index {
Index(_baseRange: base.endIndex..<base.endIndex)
}
/// - Complexity: O(1)
public subscript(i: Index) -> Element {
precondition(i < endIndex, "Index out of range")
return base[i.baseRange]
}
public subscript(bounds: Range<Index>) -> SubSequence {
let lowerRange = bounds.lowerBound.baseRange,
upperRange = bounds.upperBound.baseRange
let resultBounds: Range<Base.Index>
if lowerRange.lowerBound < upperRange.lowerBound {
resultBounds = lowerRange.lowerBound ..< upperRange.upperBound
} else {
resultBounds = lowerRange.lowerBound ..< upperRange.lowerBound
assert(resultBounds.isEmpty)
}
return SubSequence(_base: base[resultBounds], _chunkCount: chunkCount,
_firstChunkEnd: lowerRange.upperBound)
}
@inlinable
public func index(after i: Index) -> Index {
precondition(i < endIndex, "Advancing past end index")
let baseIdx = base.index(
i.baseRange.upperBound, offsetBy: chunkCount,
limitedBy: base.endIndex
) ?? base.endIndex
return Index(_baseRange: i.baseRange.upperBound..<baseIdx)
}
}
extension ChunkedByCount:
BidirectionalCollection, RandomAccessCollection
where Base: RandomAccessCollection {
@inlinable
public func index(before i: Index) -> Index {
precondition(i > startIndex, "Advancing past start index")
var offset = chunkCount
if i.baseRange.lowerBound == base.endIndex {
let remainder = base.count%chunkCount
if remainder != 0 {
offset = remainder
}
}
let baseIdx = base.index(
i.baseRange.lowerBound, offsetBy: -offset,
limitedBy: base.startIndex
) ?? base.startIndex
return Index(_baseRange: baseIdx..<i.baseRange.lowerBound)
}
}
extension ChunkedByCount {
@inlinable
public func distance(from start: Index, to end: Index) -> Int {
let distance =
base.distance(from: start.baseRange.lowerBound,
to: end.baseRange.lowerBound)
let (quotient, remainder) =
distance.quotientAndRemainder(dividingBy: chunkCount)
return quotient + remainder.signum()
}
@inlinable
public var count: Int {
let (quotient, remainder) =
base.count.quotientAndRemainder(dividingBy: chunkCount)
return quotient + remainder.signum()
}
@inlinable
public func index(
_ i: Index, offsetBy offset: Int, limitedBy limit: Index
) -> Index? {
guard offset != 0 else { return i }
guard limit != i else { return nil }
if offset > 0 {
guard limit < i || distance(from: i, to: limit) >= offset else {
return nil
}
return offsetForward(i, offsetBy: offset)
} else {
guard limit > i || distance(from: i, to: limit) <= offset else {
return nil
}
return offsetBackward(i, offsetBy: offset)
}
}
@inlinable
public func index(_ i: Index, offsetBy distance: Int) -> Index {
guard distance != 0 else { return i }
return distance > 0
? offsetForward(i, offsetBy: distance)
: offsetBackward(i, offsetBy: distance)
}
@usableFromInline
internal func offsetForward(_ i: Index, offsetBy distance: Int) -> Index {
assert(distance > 0)
return makeOffsetIndex(
from: i, baseBound: base.endIndex, baseDistance: distance * chunkCount
)
}
@usableFromInline
internal func offsetBackward(_ i: Index, offsetBy distance: Int) -> Index {
assert(distance < 0)
if i.baseRange.lowerBound == base.endIndex {
let remainder = base.count%chunkCount
// We have to take it into account when calculating offsets.
if remainder != 0 {
// Distance "minus" one(at this point distance is negative) because we
// need to adjust for the last position that have a variadic(remainder)
// number of elements.
let baseDistance = ((distance + 1) * chunkCount) - remainder
return makeOffsetIndex(
from: i, baseBound: base.startIndex, baseDistance: baseDistance
)
}
}
return makeOffsetIndex(
from: i, baseBound: base.startIndex, baseDistance: distance * chunkCount
)
}
// Helper to compute index(offsetBy:) index.
@inline(__always)
private func makeOffsetIndex(
from i: Index, baseBound: Base.Index, baseDistance: Int
) -> Index {
let baseStartIdx = base.index(
i.baseRange.lowerBound, offsetBy: baseDistance,
limitedBy: baseBound
) ?? baseBound
let baseEndIdx = base.index(
baseStartIdx, offsetBy: chunkCount, limitedBy: base.endIndex
) ?? base.endIndex
return Index(_baseRange: baseStartIdx..<baseEndIdx)
}
}
extension Collection {
/// Returns a `ChunkedCollection<Self>` view presenting the elements
/// in chunks with count of the given count parameter.
///
/// - Parameter size: The size of the chunks. If the count parameter
/// is evenly divided by the count of the base `Collection` all the
/// chunks will have the count equals to size.
/// Otherwise, the last chunk will contain the remaining elements.
///
/// let c = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
/// print(c.chunks(ofCount: 5).map(Array.init))
/// // [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
///
/// print(c.chunks(ofCount: 3).map(Array.init))
/// // [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
///
/// - Complexity: O(1)
@inlinable
public func chunks(ofCount count: Int) -> ChunkedByCount<Self> {
precondition(count > 0, "Cannot chunk with count <= 0!")
return ChunkedByCount(_base: self, _chunkCount: count)
}
}
// Conditional conformances.
extension ChunkedByCount: Equatable where Base: Equatable {}
// Since we have another stored property of type `Index` on the
// collection, synthesis of `Hashble` conformace would require
// a `Base.Index: Hashable` constraint, so we implement the hasher
// only in terms of `base`. Since the computed index is based on it,
// it should not make a difference here.
extension ChunkedByCount: Hashable where Base: Hashable {
public func hash(into hasher: inout Hasher) {
hasher.combine(base)
}
}
// Lazy conditional conformance.
extension ChunkedByCount: LazySequenceProtocol
where Base: LazySequenceProtocol {}
extension ChunkedByCount: LazyCollectionProtocol
where Base: LazyCollectionProtocol {} I didn't touch the test file, so you have to add something that executes the subsequence subscript method. |
@CTMacUser I like the idea of this — Either way, this might be more suitable as a follow-up PR since this PR seems to be good to go, and it seems like the
Thanks for pointing this out, I'm making a note that |
Agreed, this is a cool change, but let's get this integrated first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@natecook1000 Sorry, I've sent some updates can you give another look? :) |
3b4898a
to
14f09e4
Compare
Since your post, this PR has been merged and withdrawn. So I have to figure out how to resubmit the changes.
It's a protocol requirement; the compiler will block attempts for |
This just adds the
chunks(of:)
to chunk a collection into subsequences of a given size.As suggested by @bjhomer here this is just moving a long time opened evolution proposal to this package. So let me know what you think =]
cc @natecook1000 @kylemacomber
Checklist