Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reductions(_:_:) #46

Merged
merged 45 commits into from
Mar 24, 2021
Merged

Add reductions(_:_:) #46

merged 45 commits into from
Mar 24, 2021

Conversation

danielctull
Copy link
Contributor

@danielctull danielctull commented Nov 24, 2020

Description

This addition allows the caller to get a sequence of the results of reduce applied to up to and including the element of a sequence.

This adds an implementation for reductions as discussed in issue #25.

Detailed Design

To achieve this, a new sequence type called Reductions is added to the library.

For ease of use, a new method is added to sequence to provide the reductions:

extension Sequence {
  public func reductions<Result>(
    _ initial: Result,
    _ transform: @escaping (Result, Element) -> Result
) -> Reductions<Result, Self>

The implementation of subscript(position:) calls reduce each time, so there's likely some improvements to be made around the Collection conformance.

Documentation Plan

As the reductions(_:_:) method is the main entry point, this is documented with an example of use. The Reductions sequence type itself is lightly documented, which I believe follows the trend of the current APIs in the library.

In the guide, I have also gathered some quotes to show why the name reductions was chosen over scan.

Test Plan

Arrays and sequences are used to test each code path, calling reductions(_:_:) on each. The empty sequence is also tested to ensure it yields an empty sequence.

Source Impact

This is a purely additive change.

Checklist

  • I've added at least one test that validates that my change is working, if appropriate
  • I've followed the code style of the rest of the project
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary

@CTMacUser
Copy link
Contributor

I think you should make this an eager/lazy pair of methods. The eager version returns an Array, and may use a throwing function for the combining closure. The lazy version uses a non-throwing, but escaping function instead for its combing closure. (Some of pull requests I've recently made show how you can share code between versions.)

But the important part is that this facility cannot be a collection, only a sequence, since dereferencing imposes an O(n) time penalty.

@danielctull
Copy link
Contributor Author

All similar functions in other languages that I'm aware of include the initial value in the output (except for the reactive ones). Maybe we should too?

I was under the impression that others don't include the initial value. I think C++ has the ability to do either with an inclusive and exclusive variants – but I'm not a C++ person, I was just going on what I could make of their documentation.

Most of my uses in stuff like the advent of code puzzles generally require the initial value though. In drafting this, I had another function which used chain to prepend the initial value. I would love if we had both options easily available, and am open to options with respect to name. One idea I had was reductions(including:_:) and reductions(excluding:_:), but I wasn't sure if this was making the excluding case too verbose?

But the important part is that this facility cannot be a collection, only a sequence, since dereferencing imposes an O(n) time penalty.

Subscripting indeed needs to be done in constant time, but that doesn't mean that Reductions can't be a collection. If the index stores the base index as well as the accumulated value up until that point, you can satisfy all of Collection's performance requirements. BidirectionalCollection is definitely off the table, though.

Yep, after @CTMacUser's comment I went and took a look at the implementation. I got stuck with index(_:offsetBy:) for negative offsets, it feels like this hits the same problem as BidirectionalCollection where the reductions up to the given index have to be built up again.

@timvermeulen
Copy link
Contributor

timvermeulen commented Nov 25, 2020

Edit: I accidentally deleted my previous comment, but thankfully you quoted the entire thing 🙂


I was under the impression that others don't include the initial value. I think C++ has the ability to do either with an inclusive and exclusive variants – but I'm not a C++ person, I was just going on what I could make of their documentation.

From what I can see, the exclusive variant takes an explicit initial value which is included in the result, and the inclusive variant uses the first element of the iterator as the initial value, which is then also included in the result. The initial value is included in the result either way, and the functions I checked in other languages seem to behave in the same way.

I would love if we had both options easily available

It seems reasonable to me to have a variant of reductions that takes no initial value, i.e.

[1, 2, 3].reductions(+) // [1, 3, 6]

reduce doesn't have such a variant, but that probably doesn't need to stop us here.

Speaking of reduce, should reductions(into:_:) be included too? I believe it could benefit from the same copy-on-write gains that reduce(into:_:) was added for, as long as the user doesn't keep the partial result around when the next one is computed.

I got stuck with index(_:offsetBy:) for negative offsets, it feels like this hits the same problem as BidirectionalCollection where the reductions up to the given index have to be built up again.

Without BidirectionalConformance, index(_:offsetBy:) doesn't need to be able to handle negative distances. But since the elements are always going to applied one by one, there isn't much point to override this method in the first place because you won't be able to improve on the default implementation.

@CTMacUser
Copy link
Contributor

The reactive version of scan from Apple's Combine library does not vend the initial seed, although neither the documentation nor included example make that clear.

@timvermeulen
Copy link
Contributor

Apparently ReactiveX does not output the seed either, except for RxJava 😅 But I have yet to come across a non-reactive scan that does not output the seed.

@pyrtsa
Copy link

pyrtsa commented Nov 26, 2020

For a non-reactive stream, I don't think it makes sense to discard the seed because it's so easy to skip by calling dropFirst() on the result. And the seed value is good to keep because it is often useful.

Defining reductions to include its seed value also aligns with the way sequence(first:next:) is defined in the standard library.

Off-topic: For reactive streams if the result is also a stream of events, it probably makes sense to have the result of scan to match the length, and emission times, of the stream of incoming events. For those, the instant when to emit the seed would otherwise be left a bit ambiguous. The situation would be different if instead of a stream of discrete events the resulting value would be a continuous-time defined value (sometimes known as a behaviour). In that case it would start off immediately at the seed value and change to successive values right after every incoming event of inputs. I think we should view the result of reductions like that: it's the cumulative values for which the input sequence is the number of changes (and thus one less in length).

@danielctull danielctull marked this pull request as draft November 26, 2020 12:04
@danielctull
Copy link
Contributor Author

danielctull commented Nov 26, 2020

These latest commits improve the implementation of the Collection conformance, remove BidirectionalCollection conformance and add two eager variants: one that takes no initial value and another which outputs the initial value at the start of the returned array. I realise lazy versions of these should also be created as well, but would rather we made sure what we wanted from the API before changing that part too. 🙂

In its current state the API looks like the following:

[1, 2, 3].reductions(excluding: 10, +) // [11, 13, 16]
[1, 2, 3].reductions(including: 10, +) // [10, 11, 13, 16]
[1, 2, 3].reductions(+) // [1, 3, 6]

I quite like @pyrtsa's point that if reductions included the provided initial value by default and the first API was removed, it is easy to produce the functionality of that removed function by calling dropFirst() on the result, keeping the reductions API cleaner. If this was the consensus, the eager API would be the following, with matching lazy versions.

[1, 2, 3].reductions(10, +) // [10, 11, 13, 16]
[1, 2, 3].reductions(+) // [1, 3, 6]

@pyrtsa
Copy link

pyrtsa commented Nov 26, 2020

I agree this would be ideal:

[1, 2, 3].reductions(10, +) // [10, 11, 13, 16]
[1, 2, 3].reductions(+) // [1, 3, 6]

@timvermeulen
Copy link
Contributor

And potentially:

[1, 2, 3].reductions(into: 10, +=) // [10, 11, 13, 16]

reductions(_:_:) can always just call reductions(into:_:), but the variant without a seed does probably need different internals.

@danielctull
Copy link
Contributor Author

And potentially:

[1, 2, 3].reductions(into: 10, +=) // [10, 11, 13, 16]

Yup! I was trying to summarise the comments so far and I failed, but I agree this should be added too. 🙂

@pyrtsa
Copy link

pyrtsa commented Nov 26, 2020

While it might be nice for symmetry, I don't see the eager reductions(into:_:) as having such a strong point for existence as reduce(into:_:) originally did, because its implementation is nevertheless going to make N modified copies in addition to the original seed.

The lazy version might be good though (because iterating over it could benefit from copy-on-write), and maybe the eager one indeed just for symmetry.

@timvermeulen
Copy link
Contributor

The eager one can still have ergonomic advantages! Just no performance ones 🙂

@CTMacUser
Copy link
Contributor

Eager versions can also use throwing closures.

@odmir
Copy link

odmir commented Nov 26, 2020

let a = [1, 2, 3, 4].lazy.prefix(2).map { $0 }
let b = [1, 2, 3, 4].lazy.reductions(0, +).map { $0 }

print(type(of: a))
// LazyMapSequence<Slice<LazySequence<Array<Int>>>, Int>
print(type(of: b))
// Array<Int>

The current implementation loses the "laziness" information requiring one to apply .lazy again to the result.

@danielctull
Copy link
Contributor Author

danielctull commented Nov 26, 2020

The current implementation loses the "laziness" information requiring one to apply .lazy again to the result.

Thanks @odmir, the conditional conformances to LazySequenceProtocol and LazyCollectionProtocol was missing. Using your example now yields the following. 🙂

let a = [1, 2, 3, 4].lazy.prefix(2).map { $0 }
let b = [1, 2, 3, 4].lazy.reductions(0, +).map { $0 }

print(type(of: a))
// LazyMapSequence<Slice<LazySequence<Array<Int>>>, Int>
print(type(of: b))
// LazyMapSequence<Reductions<Int, LazySequence<Array<Int>>>, Int>

Comment on lines 88 to 207
public func reductions(
_ transform: (Element, Element) throws -> Element
) rethrows -> [Element] {

guard let first = first else { return [] }
return try dropFirst().reductions(first, transform)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to add this to Sequence without having to duplicate any logic, you could take the iterator, call next on it to get the seed, and then turn the iterator back into a sequence using IteratorSequence (which hopefully optimizes well).

@danielctull
Copy link
Contributor Author

I'm writing the lazy version of reductions and hit a question with the empty case if you fine people could guide me. 🙂

Given that reductions now includes the initial seed value, does that mean should EmptyCollection<Int>().reductions(10, +) yield [10]? Or should it return an empty array?

@odmir
Copy link

odmir commented Nov 27, 2020

The current implementation loses the "laziness" information requiring one to apply .lazy again to the result.

Thanks @odmir, the conditional conformances to LazySequenceProtocol and LazyCollectionProtocol was missing. Using your example now yields the following. 🙂

let a = [1, 2, 3, 4].lazy.prefix(2).map { $0 }
let b = [1, 2, 3, 4].lazy.reductions(0, +).map { $0 }

print(type(of: a))
// LazyMapSequence<Slice<LazySequence<Array<Int>>>, Int>
print(type(of: b))
// LazyMapSequence<Reductions<Int, LazySequence<Array<Int>>>, Int>

Of course that's the correct way of doing it, sorry for the noise! 🤦

@xwu
Copy link
Contributor

xwu commented Nov 27, 2020

Given that reductions now includes the initial seed value, does that mean should EmptyCollection<Int>().reductions(10, +) yield [10]?

Yes, and this appears to be how it behaves in other languages too. The version that takes an initial value would always have one more element in the result than are in the input collection, and the version that doesn’t, wouldn’t.

@danielctull
Copy link
Contributor Author

danielctull commented Nov 27, 2020

Of course that's the correct way of doing it, sorry for the noise! 🤦

@odmir: No noise at all, I appreciate you raising the concern! 🙂

@danielctull
Copy link
Contributor Author

Thanks for confirming that @xwu!

@danielctull danielctull force-pushed the reductions branch 2 times, most recently from 7cbff5c to b8bd047 Compare December 3, 2020 08:03
@benlings
Copy link

benlings commented Dec 6, 2020

Would it be worth adding a deprecated alias scan for this (eg. like this? I have no problem with the name reductions, but would probably not have found it and assumed it didn't exist.

@danielctull
Copy link
Contributor Author

Wow, I just remembered that I had this in flight. I moved house and have been really busy with that so haven't had much time to pursue this stuff.

Does anyone have any thoughts about @benlings' thought to include scan as a deprecated function to allow for easier discovery? Personally, I think the library is small enough that reductions would stand out on its own, but I'm happy to add a deprecated scan function.

@jasdev
Copy link

jasdev commented Feb 14, 2021

Does anyone have any thoughts about @benlings’ thought to include scan as a deprecated function to allow for easier discovery?

+1 from me — I actually stumbled upon this PR by searching around for “scan.”

Thanks for driving — and following up on — this PR, @danielctull! 🙏🏽

Copy link
Contributor

@timvermeulen timvermeulen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to see this getting close to the finish line! I've left some more feedback. It would also be great to mark public methods with @inlinable and internal methods with @usableFromInline like we do in the rest of the package.

Let's indeed also add a deprecated scan method for discoverability purposes — we do not currently have a rigorous policy in place on when we want to add deprecated methods with names from prior art, but in this case scan is such a widely known name for this operation that it's worthwhile doing this.

Guides/Reductions.md Outdated Show resolved Hide resolved
Guides/Reductions.md Outdated Show resolved Hide resolved
Guides/Reductions.md Outdated Show resolved Hide resolved
Guides/Reductions.md Outdated Show resolved Hide resolved
Sources/Algorithms/Reductions.swift Outdated Show resolved Hide resolved
Comment on lines 43 to 72
extension Sequence {

public func reductions<Result>(
_ initial: Result,
_ transform: (Result, Element) throws -> Result
) rethrows -> [Result] {

var result = initial
return try reductions(into: &result) { result, element in
result = try transform(result, element)
}
}

public func reductions<Result>(
into initial: inout Result,
_ transform: (inout Result, Element) throws -> Void
) rethrows -> [Result] {

var output = [Result]()
output.reserveCapacity(underestimatedCount + 1)
output.append(initial)

for element in self {
try transform(&initial, element)
output.append(initial)
}

return output
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These still need documentation, you can probably reiterate parts of the guide here. It's also a good idea to have a look at the reduce doc comments, since they're so related.

Sources/Algorithms/Reductions.swift Outdated Show resolved Hide resolved
Comment on lines 279 to 283
enum Representation {
case start
case base(index: BaseIndex, result: Result)
case end
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some experiments and it doesn't seem like we need the start case at all, we can instead represent the start index by wrapping base.startIndex in the base case. Then you should be able to pretty much remove any code that currently uses .start — let me know if you run into any problems doing that.

Comment on lines 163 to 180
extension LazySequenceProtocol {

public func reductions(
_ transform: @escaping (Element, Element) -> Element
) -> InclusiveReductions<Self> {
InclusiveReductions(base: self, transform: transform)
}
}

extension Sequence {
public func reductions(
_ transform: (Element, Element) throws -> Element
) rethrows -> [Element] {
var iterator = makeIterator()
guard let initial = iterator.next() else { return [] }
return try IteratorSequence(iterator).reductions(initial, transform)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are missing documentation as well. Might also be worth pointing out the relation to the version of reductions that takes an initial result.

@danielctull
Copy link
Contributor Author

Cheers for those @timvermeulen! Will look at them over the next day. 😊

@timvermeulen
Copy link
Contributor

Hey @danielctull, this is to let you know we'll soon be landing this addition and making the minor changes necessary in order to ship this in the next release. 🚀 Thank you for all your work on this, and feel free to make any changes you still want to make in the meantime.

@danielctull
Copy link
Contributor Author

Cheers for the prompt @timvermeulen, I actually have some time tonight to get the rest of the modifications and documentation changes made that you suggested if that's alright?

@timvermeulen
Copy link
Contributor

Totally!

@danielctull
Copy link
Contributor Author

danielctull commented Mar 19, 2021

The doc comments are there, but I know they need some re-working. The guide will also needs some tweaking to align with these revised comments. I'm planning to look at these tomorrow morning British time. 🙂

@timvermeulen: I'm personally not convinced about removing .start from the indexes because it seems to make it a little harder to reason about generally, happy to discuss this though.

@kylemacomber
Copy link
Contributor

@swift-ci please test

@timvermeulen
Copy link
Contributor

@danielctull The last changes look great! The start case thing is a small tweak that we don't need to concern ourselves with right now, we can always revisit it later. Let's land this if you have no more changes you wanted to make.

@danielctull danielctull marked this pull request as ready for review March 24, 2021 15:56
@danielctull
Copy link
Contributor Author

@timvermeulen: Thanks for that. Agreed, I'm all done with the changes for now. :)

@timvermeulen timvermeulen merged commit b171b81 into apple:main Mar 24, 2021
@danielctull danielctull deleted the reductions branch March 25, 2021 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants