Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add grouped(by:) and keyed(by:) #197

Merged
merged 5 commits into from
Nov 7, 2023

Conversation

amomchilov
Copy link
Contributor

@amomchilov amomchilov commented Jun 21, 2023

Description

Pitch: https://forums.swift.org/t/pitch-sequence-grouped-by-and-keyed-by/65511
Swift Evolution proposal: https://github.com/apple/swift-evolution/pull/2078/files?short_path=2767014

Now that this is shipped, here's a GitHub search that shows usages in public repos.

Detailed Design

Include any additional information about the design here. At minimum, show any new API:

extension Sequence {
  func grouped<GroupKey>(
    by keyForValue: (Element) throws -> GroupKey
  ) rethrows -> [GroupKey: [Element]]
  
  func keyed<Key>(
    by keyForValue: (Element) throws -> Key,
    uniquingKeysWith combine: ((Key, Element, Element) throws -> Element)? = nil
  ) rethrows -> [Key: Element]
}

Documentation Plan

I've updated new Grouping.md and Keyed.md guides, and updated README to point to thme.

Test Plan

I wrote tests loosely based on the Standard Library's tests for Dictionary.init(grouping:by:). The Standard Library has a magical expectCrashLater() which lets you test fatalErrors and such. Without it, I'm not sure how to test properly test the fatalError() this PR introduces to keyed(by:).

Source Impact

Purely additive.

Checklist

  • I've added at least one test that validates that my change is working, if appropriate
  • I've followed the code style of the rest of the project
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary

Guides/Grouped.md Outdated Show resolved Hide resolved
Guides/Grouped.md Outdated Show resolved Hide resolved
Guides/Grouped.md Show resolved Hide resolved
Comment on lines +64 to +63
2. Picking which elements end up in the groupings.
1. The default is the elements of the input sequence, but can be changed.
2. Akin to calling `.transformValues { group in group.map(someTransform) }` on the resultant dictionary, but avoiding the intermediate allocation of Arrays of each group.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side thought: If grouped(by:) was instead implemented to be lazy, the keyed(by: \.key) could be replaced with grouped(by: \.key).transformValues(\.last). It's still wordy though, so I think keyed(by:) is keeping as-is.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd still be notably less efficient, due (mainly) to the construction of the intermediary arrays. I assume the Swift compiler won't be clever enough to eliminate those.

Aside from that, how would lazy execution work? Wouldn't you have to run through the complete input sequence in order to know when you have all the values for any given key?

Sources/Algorithms/Keyed.swift Outdated Show resolved Hide resolved
Comment on lines 48 to 53
// This causes a second look-up for the same key. The standard library can avoid that
// by calling `mutatingFind` to get access to the bucket where the value will end up,
// and updating in place.
// Swift Algorithms doesn't have access to that API, so we make due.
// When this gets merged into the standard library, we should optimize this.
result[key] = valueToKeep
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will be quite as efficient as the stdlib version, but you can eliminate the duplicated lookup and hashing by using index(forKey:) and then mutating result.values[i] instead of using the key-based subscript.

Copy link
Contributor Author

@amomchilov amomchilov Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that quite works. It solves the case of double-lookups for conflicting keys, but it shifts the problem to the initial assignment:

if let existingIndex = result.index(forKey: key) { // First hash
    let valueToKeep = try combine!(key, oldValue, element)
    result.values[existingIndex] = valueToKeey // No longer needs a second hash
} else { // The item is new and unique
    result[key] = element // Second hash... the problem moved to this (more common) case
}

Sources/Algorithms/Keyed.swift Outdated Show resolved Hide resolved
@inlinable
public func keyed<Key>(
by keyForValue: (Element) throws -> Key,
uniquingKeysWith combine: ((Key, Element, Element) throws -> Element)? = nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is uniquingKeysWith still the right naming? After all, it's not the keys being uniqued, it's the values.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It does follow convention from the Dictionary initialiser, though.

Maybe resolvingConflictsWith?

Technically this is equivalent to first grouping by the keys then reducing the resulting dictionary values… so maybe reducingWith? In a way that's nicer because it reminds us that you don't have to return one or other of the given values - you can technically return any value of the same type, such as the sum of both values, or the largest or smallest, etc.

There's also no way, with this design, to preserve both elements but under different keys. But I'm not sure that needs to be supported… in such cases, perhaps group(by:) is sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe resolvingConflictsWith?

I'm digging this. "Resolving" is clear enough in context, while succinct and not overly prescriptive as to what the closure's resolution logic should be

There's also no way, with this design, to preserve both elements but under different keys. But I'm not sure that needs to be supported… in such cases, perhaps group(by:) is sufficient?

Agree.

Sources/Algorithms/Keyed.swift Outdated Show resolved Hide resolved
@amomchilov amomchilov marked this pull request as ready for review June 21, 2023 21:21
@amomchilov amomchilov changed the title Add Grouped and Keyed Add grouped(by:) and keyed(by:) Jun 21, 2023
Copy link
Member

@natecook1000 natecook1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! I have two questions about the design:

  1. The keyed(by:) and grouped(by:) methods seem very related — do you think it would make sense to include a method like this (naming tbd) that incorporates/enables them both? grouped(by:) is a specialization of this, as is a "frequencies" method if I'm thinking of it correctly (e.g. seq.keyed(by: \.self, transform: { _ in 1 }, merge: +))

    func keyed<Key: Hashable, NewElement>(
        by: (Element) -> Key,
        transform: (Element) throws -> NewElement,
        merge: (NewElement, NewElement) throws -> NewElement
    ) rethrows -> [Key: NewElement]
  2. I don't know that a crash is the right default behavior for keyed(by:) — for the dictionary initializer we require the unique parameter name so that an author is essentially asserting the right behavior. Do you think crashing there is okay without any other warning, or would a "last value wins" approach be acceptable? (That would match the behavior of most other implementations and replicate the trivial code that this method replaces.)

Guides/Grouped.md Outdated Show resolved Hide resolved
Guides/Grouped.md Outdated Show resolved Hide resolved
Guides/Grouped.md Outdated Show resolved Hide resolved
Guides/Keyed.md Outdated Show resolved Hide resolved
Sources/Algorithms/Keyed.swift Outdated Show resolved Hide resolved
Comment on lines 48 to 53
// This causes a second look-up for the same key. The standard library can avoid that
// by calling `mutatingFind` to get access to the bucket where the value will end up,
// and updating in place.
// Swift Algorithms doesn't have access to that API, so we make due.
// When this gets merged into the standard library, we should optimize this.
result[key] = valueToKeep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will be quite as efficient as the stdlib version, but you can eliminate the duplicated lookup and hashing by using index(forKey:) and then mutating result.values[i] instead of using the key-based subscript.


func testNonUniqueKeys() throws {
throw XCTSkip("""
TODO: What's the XCTest equivalent to `expectCrashLater()`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't test this with XCTest, unfortunately.

@amomchilov
Copy link
Contributor Author

amomchilov commented Jun 26, 2023

do you think it would make sense to include a method like this (naming tbd) that incorporates/enables them both?

For the sake of flexibility, I would like to eventually also have this API, but even if it exists, I think specialized version for these two use cases (and tally) is worthwhile, to simplify the common case.

If you have any ideas in this regard, could you chime them in on the pitch thread?

I don't know that a crash is the right default behavior for keyed(by:) — for the dictionary initializer we require the unique parameter name so that an author is essentially asserting the right behavior. Do you think crashing there is okay without any other warning, or would a "last value wins" approach be acceptable? (That would match the behavior of most other implementations and replicate the trivial code that this method replaces.)

That's a very good point. Perhaps "last value wins" reduces crashes, but might let incorrectness slide by... IDK what's preferable. 🤷

Guides/Keyed.md Outdated
]
```

Duplicate keys will trigger a runtime error by default. To handle this, you can provide a closure which specifies which value to keep:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the method a bit dangerous, since a naive user could fail to realise keys might collide. Even a knowledgeable user might make the simple mistake of forgetting to provide the uniquingKeysWith argument (e.g. auto-complete omits it and they don't notice in the moment).

It makes it more verbose to handle 'unsafe' data - arguably the common case in today's network-centric world - since you have to implement the second closure (as merely a throw statement) in order to avoid crashing on bad input.

These problems can be addressed by having two versions of the method:

public func keyed<Key>(
    by keyForValue: (Element) throws -> Key
) throws -> [Key: Element]

public func keyed<Key>(
    by keyForValue: (Element) throws -> Key,
    uniquingKeysWith combine: (Key, Element, Element) throws -> Element
) rethrows -> [Key: Element]

This preserves the ability to have a non-throwing version - and thus not require use of try - iff the second form is used and both closures don't throw, which is the only case where it's compile-time assured that it cannot fail.

Any user can still use try! to mimic the other behaviour (of crashing), if they wish. And the explicit presence of that ! in their code is an important warning that it is potentially unsafe.

(it also has the minor benefit of not allowing you to explicitly set uniquingKeysWith to nil, which has ambiguous and potentially misleading intent, e.g. a reader might assume that makes it not unique keys (somehow))

Copy link
Contributor Author

@amomchilov amomchilov Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These problems can be addressed by having two versions of the method:

I love this idea! I've incorporated these suggestions, would you like to take a look?

(it also has the minor benefit of not allowing you to explicitly set uniquingKeysWith to nil, which has ambiguous and potentially misleading intent, e.g. a reader might assume that makes it not unique keys (somehow))

Good call!


1. Changing the type of the groups.
1. E.g. the groups can be Sets instead of Arrays.
1. Akin to calling `.transformValues { group in Set(group) }` on the resultant dictionary, but avoiding the intermediate allocation of Arrays of each group.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forward

I explored all this really just for my own edification, but it might be helpful to include some summary of this in your notes here, to explain why the group(by:) API doesn't and cannot currently support this functionality. Adding all the necessary infrastructure - whether a Collector protocol or whatever else - seems like a more fundamental addition to the Swift standard library that should be tackled in its own patch & pitch.

Rabbit hole

I like the idea of supporting this, but it doesn't seem like it's possible [elegantly] in Swift today…? Swift's standard library & Foundation don't include an equivalent to Java's Collector interface, for example. And even though I'm not a big fan of that approach anyway, you fundamentally do need a way to say "any collection that can be initialised and added to", which Swift does not have.

Specifically: you can easily genericise group(by:) with some ValueCollection type, but what protocol would it have to conform to? Collection specifies only the APIs for reading from a given collection, not for constructing one or adding to it. MutableCollection (surprisingly) has the exact same limitations, and feels inappropriate anyway since it forces the resulting Dictionary to have a mutable type for its values.

There is RangeReplaceableCollection as a half measure - it does include an init method, but only a limited subset of standard collections conform to it (notably not including Set). Still, with that you can do:

extension Sequence {
    public func grouped<GroupKey, ValueCollection: RangeReplaceableCollection<Element>>(
        by keyForValue: (Element) throws -> GroupKey
    ) rethrows -> [GroupKey: ValueCollection] {
        var result = [GroupKey: ValueCollection]()

        for value in self {
            result[try keyForValue(value), default: ValueCollection()].append(value)
        }

        return result
    }
}

Semantically it's a hack, of course - there's nothing about the range replacement functionality actually needed here, it's just used because it exposes other APIs almost as a side-effect.

And anyway, the ergonomics are poor since Swift doesn't support specifying a default value for a generic parameter, requiring boilerplate in many cases:

[1, 2, 3, 4].grouped { $0 % 2 } // ❌ Generic parameter 'ValueCollection' could not be inferred

let result: [Int: [Int]] = [1, 2, 3, 4].grouped { $0 % 2 } // Works, but awkward for many use-cases.

One could work around the initialisation problem by having the caller explicitly provide an initialiser, but you still need a protocol that all collections support which specifies some kind of append method, and that doesn't currently exist.

It is possible to support arbitrary collection types, but only by having the caller provide a reducer explicitly in order to work around the aforementioned limitations, e.g.:

extension Sequence {
    public func grouped<GroupKey, ValueCollection: Collection>(
        by keyForValue: (Element) throws -> GroupKey,
        collectionInitialiser: () -> ValueCollection = Array<Element>.init,
        collectionReducer: (inout ValueCollection, Element) -> ValueCollection
    ) rethrows -> [GroupKey: ValueCollection] where ValueCollection.Element == Element {
        var result = [GroupKey: ValueCollection]()

        for value in self {
            collectionReducer(&result[try keyForValue(value), default: collectionInitialiser()], value)
        }

        return result
    }
}

let result: [Int: [Int]] = [1, 2, 3, 4].grouped(by: { $0 % 2 },
                                                collectionReducer: { $0.append($1); return $0 })

This is essentially now a functional superset of group(by:) and keyed(by:).

You can't provide a default reducer since there's no way to generically add a value to any type of collection (per the earlier point). So this'd have to be a special parallel version of group(by:), in addition to the 'simple' one that just hard-codes use of Array and doesn't require that a reducer be specified.

I'm not a fan of this approach, technically capable as it may be, though admittedly that's from a subjective standpoint. I think it'd be more elegant, and cleaner for library authors, to evolve Swift (or the Swift standard library) to better support generalising across all collection types.

Copy link
Contributor Author

@amomchilov amomchilov Jul 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting findings, thanks for sharing this with me! How do you think it would be best to communicate these in the library? Perhaps an addendum to the new md files I'm adding in this PR?

but you still need a protocol that all collections support which specifies some kind of append method, and that doesn't currently exist.

Oh wow, I never noticed this, but you're right!

It is possible to support arbitrary collection types, but only by having the caller provide a reducer explicitly in order to work around the aforementioned limitations, e.g.:

This is what I had in mind (or use a new protocol with method requirements, instead of two separate closure params).

This is more general, because the groups might not be collections at all. The histogram example comes to mind, where you'd want something like:

[1, 2, 3, 4].grouped(
    by: { $0 % 2 },
    collectionInitialiser: { 0 }, // not a collection, but w/e
    collectionReducer: { counter, _ in counter += 1 },
)

If there were a protocol for this, then there might be some CountingCollector that could be used.

C# and Java's APIs are particularly interesting in this area, though I think a lot of their designs were constrained by the lack of tuples (at the time of their design).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how best to summarise that - obviously what I wrote, as a kind of deep-dive on this tangent, is way too much for these documents. Maybe it suffices to say something like "This would require either (a) a new protocol, supported by all standard library collection types, which represents an initialisable object that can be added to, or (b) a variant method which accepts a custom 'reducer' closure". If you want to say anything about it at all. Perhaps this thread here on the pull request is sufficient documentation for posterity (which could be hyperlinked to from the markdown file).

Having used Java's version of this a few times in real-world code, my main comment is that these sorts of APIs - groupby etc - should never require explicit specification of the return types (or a custom reducer). They can allow it, through overloads or optional parameters, but it needs to work effortlessly for the vastly most common case of just returning a Dictionary of Arrays. A lot of Java APIs explicitly require you to pass a collector which is almost always Collectors.toList()… it gets very repetitive, and verbose.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as a counter-argument (Devil's advocate style) for the use of this custom-reducer grouped(by:) variant, you can achieve the same thing with:

[1, 2, 3, 4].reduce(into: [Int: Int]()) { $0[$1, default: 0] += 1 }

(and the explicit typing of the Result type can be omitted in some cases where it's implied by surrounding context, making this even terser)

Personally I don't mind if a language or library provides multiple ways to achieve the same goals, but I'm not sure a fancier group(by:) adds any benefit over the existing reduce(into:_:) approach.

That said, you can implement group(by:) using reduce(into:_:) too, e.g.:

[1, 2, 3, 4].reduce(into: [Int: [Int]]()) {
    $0[$1 % 2, default: []].append($1)
}

…yet I still feel like a purpose-built group(by:) is worth having. Perhaps it's technically just a "convenience", a bit more ergonomic - but it feels like it's worth it for this simple, common case. I guess I'm just not sure where to draw the line between a more powerful group(by:) and just having people fall back to reduce(into:_:) for uncommon needs.

Counter-counter-argument: even though reduce(_:_:) and reduce(into:_:) can be used for a huge variety of tasks, I usually find them hard to interpret when reading the code. A large number of Sequence methods can be implemented using just the reducer methods - e.g. allSatisfy, min / max, and even map! - yet we still have all those purpose-built methods. And I think Swift is better for them - they make code much more readable, even if they're technically redundant.

Copy link
Contributor Author

@amomchilov amomchilov Jul 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm just not sure where to draw the line between a more powerful group(by:) and just having people fall back to reduce(into:_:) for uncommon needs.

Likewise. I have conviction about the base group(by:) and its semantics, but I get less confident the further out I get on the landscape extensibility. Java's a prime example, where the collectors are maximally flexible, but a complete chore in 99% of the cases.

Counter-counter-argument: even though reduce(_:_:) and reduce(into:_:) can be used for a huge variety of tasks, I usually find them hard to interpret when reading the code. A large number of Sequence methods can be implemented using just the reducer methods - e.g. allSatisfy, min / max, and even map! - yet we still have all those purpose-built methods. And I think Swift is better for them - they make code much more readable, even if they're technically redundant.

Heh I have an entire little blog post about this: https://github.com/amomchilov/Blog/blob/master/Don't%20abuse%20reduce.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this research something you'd like to commit? You deserve to have your name on your findings :)

Guides/Keyed.md Outdated Show resolved Hide resolved
Guides/Keyed.md Outdated Show resolved Hide resolved
Guides/Keyed.md Outdated Show resolved Hide resolved
public let previousElement: Element
public let conflictedElement: Element

public init(key: Key, previousElement: Element, conflictedElement: Element) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something really obvious, but… why is an explicit initialiser necessary? Doesn't Swift implicitly generate a memberwise initialiser?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implicit generated initialiser is internal but this one is public. Also, you probably want to mark the init @inlinable because it is generic and otherwise will not spezialise outside of this module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure about @inlinable.

otherwise will not spezialise outside of this module.

Since this is kind of an edge case, is the perf improvement worth it relative to the code size cost of specializing it? What's the SOP for deciding on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is kind of an edge case, is the perf improvement worth it relative to the code size cost of specializing it?

IIUC @inlinable will just make the body of function available for specialization outsize the module, but at the end is the optimizer which will decide based on a cost model if is going to inline or not. e.g. if user compiles with -Osize it may not inline that call at all even though is marked as inlinable. For given a hint that we always want to inline, we can use @inline(__always) together with @inlinable.

What's the SOP for deciding on that?

With that said, I think is always worth to add @inlinable. Specially in this case as code size impact would be not significant. But I also see why it wouldn't worth it


extension Sequence {
/// Creates a new Dictionary from the elements of `self`, keyed by the
/// results returned by the given `keyForValue` closure. Deriving the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to say which specific error type is thrown. Plus this can be moved to the formal throws DocC field, a la:

/// - Throws: `KeysAreNotUnique ` if two values map to the same key (via `keyForValue`).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, there are no other thrown errors in the standard library, or even swift-algorithms. Should this be a documented error that we expect users to catch, or should it be treated more like a softer version of a fatalError?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be documented, mainly because callers need to be able to reliably distinguish between errors thrown by this library method itself vs deeper errors from the closures, because they might want to handle those cases differently (e.g. distinguish to the user that the problem is duplicate keys vs an invalid value, or failure to fetch some additional data from the network, etc).

As to this being novel for this library, that might just be coincidence? Perhaps swift-algorithms just doesn't happen to contain a lot of cases where there is an error condition (that can't otherwise be handled implicitly, such as returning nil).

Looking at all the uses of fatalError in this library, they're all related to indexing errors. Those are a different breed, I think - they're more likely to represent actual code bugs, for a start.

Pragmatically, perhaps the key decision point is how much work it is to pre-emptively validate the data vs just catching an exception. For subscripting, for example, it's arguably easy & fast enough to just check if your index is within range first. A similar argument can be made for why the basic arithmetic operators in Swift will just crash on overflow / underflow / etc rather than throw exceptions. But for more complex operations - like group(by:) and keyed(by:), pre-emptively checking for duplicate keys means basically reimplementing those methods. I don't think that's reasonable to expect of callers.

Sidebar on crashing philosophy

There's a quiet philosophical debate within the Swift community about how to handle errors. Some folks think it's preferable to just crash (and thus save API users from having to deal with exceptions), while others (myself included) prefer to give callers control over whether they crash or handle errors in some other manner.

Since crashing can be achieved with just a suitably-placed try!, there's very little added cost for people that want their application to crash instead of having to handle the exception.

For most use-cases, crashing just isn't acceptable (e.g. many server applications, most end-user applications, etc). Bad input data should just fail the relevant, specific operation, not the entire application. e.g. even if it's the Save operation that's broken by the exception, letting the application keep running at least gives the user a chance to save their data in other ways (e.g. copy-pasting it into another application, or dragging it out to a file in the Finder, etc).

More broadly, crashing is almost always an unsafe operation - who knows what things in the system, like databases or other data stores on disk, are in an invalid state right at that moment. The stack needs to be gracefully unwound in order to ensure persistent state is not corrupted.

Comment on lines 36 to 46
var result = [Key: Element]()

for element in self {
let key = try keyForValue(element)

if let previousElement = result.updateValue(element, forKey: key) {
throw KeysAreNotUnique(key: key, previousElement: previousElement, conflictedElement: element)
}
}

return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have this in terms of the other overload? e.g.

Suggested change
var result = [Key: Element]()
for element in self {
let key = try keyForValue(element)
if let previousElement = result.updateValue(element, forKey: key) {
throw KeysAreNotUnique(key: key, previousElement: previousElement, conflictedElement: element)
}
}
return result
return try self.keyed(by: keyForValue, resolvingConflictsWith: { key, previousElt, conflictedElt in
throw KeysAreNotUnique(key: key, previousElement: previousElement, conflictedElement: element)
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooo fantastic! This was a case of path-dependence, it wasn't possible before, but now that it's two functions, it's available and much nicer!

@amomchilov
Copy link
Contributor Author

Thanks for the feedback everyone! I'll try to incorporate it all this weekend

@@ -86,7 +86,7 @@ call to `CombinationsSequence.Iterator.next()` is an O(_n_) operation.
### Naming

The parameter label in `combination(ofCount:)` is the best match for the
Swift API guidelines. A few other options were considered:
[Swift's API Design Guidelines](https://www.swift.org/documentation/api-design-guidelines/). A few other options were considered:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this change and the previous change, but they should be in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting one PR merged has been difficult enough, I'm apprehensive to open a second just for something so tiny and low-value 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to rubber-stamp that one immediately =)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! #210

Could you please review this PR next? :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, @natecook1000 and I will review this week

Copy link
Member

@natecook1000 natecook1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your work on this, @amomchilov, and especially for your perseverance in seeing this addition land!

Let's use a slightly revised solution for the keyed(by:) overload: this single-argument version should act like the code that it would typically replace – that is, keeping just the last element if there are any duplicated keys produced by the keyForValue closure. Especially with the potential upcoming advent of typed throws, I'd prefer to keep the existing system of only rethrowing errors from user-provided closures, instead of throwing new errors directly from the algorithms library.

let fruits = ["Apricot", "Banana", "Apple", "Cherry", "Avocado", "Coconut"]
let keyed1 = fruits.keyed(by: { $0.first! })
// keyed1 == ["A": "Avocado", "B": "Banana", "C": "Coconut"]

// equivalent to:
var keyed2: [Character: String] = [:]
for fruit in fruits {
    keyed2[fruit.first!] = fruit
}

This behavior matches that of the other languages that have this feature already (except for Java, which predictably throws an exception), and seems like it will lead to less surprise than having people reckon with the potential thrown error.

It doesn't exactly match the behavior of of the Dictionary(uniqueKeysAndValues:) initializer, but I think that's fine, given the presence of the key-producing transformation. For those who would like to customize the value selection (or throw an error or trap), there's always the second keyed(by:resolvingConflictsWith:) method.

@amomchilov
Copy link
Contributor Author

@natecook1000 That sounds good by me! I'll try to rework this on the weekend :)

@amomchilov
Copy link
Contributor Author

amomchilov commented Nov 5, 2023

Hey @natecook1000 @stephentyrone,

I rebased, and my latest commit changes from the original "throw on collision" behaviour to the new "keep latest" behaviour.

@natecook1000
Copy link
Member

@swift-ci Please test

Copy link
Member

@natecook1000 natecook1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@natecook1000 natecook1000 merged commit 328631d into apple:main Nov 7, 2023
2 checks passed
@stephentyrone
Copy link
Member

Thanks for your persistence on this one, @amomchilov!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants