Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow Collectors #3280

Merged
merged 20 commits into from
Jan 16, 2024
Merged

Arrow Collectors #3280

merged 20 commits into from
Jan 16, 2024

Conversation

serras
Copy link
Member

@serras serras commented Nov 3, 2023

This is an idea for a new library heavily influenced by Java's Collector and Haskell's foldl library. The goal is to build complex computations over sequences of values, guaranteeing that those values are consumed only once.

For example, if you want to compute the average of a sequence in one go, you can write:

val average = zip(Collectors.sum, Collectors.length) { s, l -> s.toDouble() / l.toDouble() }
list.collect(average)

The "trick" is that collectors have a nice interface based on map/contramap/zip (they are applicative functors and profunctors, if we were to use fancy words). Combined with Kotlin's cold Flows, they provide a nice combination.

Since this is a pattern coming from functional programming, I thought that it may be a good fit for Arrow.

@serras serras requested review from raulraja and nomisRev November 3, 2023 21:16
Copy link
Member

@raulraja raulraja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great @serras . I realize some of the stuff we did back in the days for Arrow Query language could be easier built on top of this #1079

@hoc081098
Copy link
Contributor

Great. Arrow will be more popular 😂.

Copy link
Member

@nomisRev nomisRev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome 🙌 👏 👏 The implementation of Blocking is incorrect and has some issues. If it's not clear what I meant, I would be more happy to jump on a quick call, make a commit on this branch, or clear up any doubts you have in this PR ☺️

* so it's potential faster than [Collectors.map].
*/
@Suppress("UnusedReceiverParameter")
public fun <K, V> Collectors.concurrentMap(): Collector<Pair<K, V>, ConcurrentMap<K, V>> =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice JVM specific addition! 🙌

Comment on lines 6 to 7
public fun <T, R> java.util.stream.Collector<T, *, R>.asCollector(): Collector<T, R> =
Collectors.jvm(this)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

…lectors/Collect.kt

Co-authored-by: Petrus Nguyễn Thái Học <hoc081098@gmail.com>
@serras serras mentioned this pull request Nov 7, 2023
@serras serras marked this pull request as ready for review November 7, 2023 07:43
@serras
Copy link
Member Author

serras commented Nov 7, 2023

Ready for review! If this gets merged, I’ll happily write docs for the Arrow Website. And even some sort of blog post, if you want!

Copy link
Member

@nomisRev nomisRev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @serras!! Thank you for the amazing work 🙌 👏 🥳

Comment on lines +45 to +59
nodejs {
testTask {
useMocha {
timeout = "60s"
}
}
}
browser {
testTask {
useKarma {
useChromeHeadless()
timeout.set(Duration.ofMinutes(1))
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic: I think would be good if we can put this in some common configuration at some point. Still not sure what's the best way to do that in Gradle 🙃

* Java's [`Collector`](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html)
* and Haskell's [`foldl` library](https://hackage.haskell.org/package/foldl/docs/Control-Foldl.html).
*/
public interface CollectorI<InternalAccumulator, in Value, out Result> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not a huge fan of CollectorI but not sure if something else makes sense.. I guess InternalAccumulator is State? What about StateCollector<State, in Value, out Result>, or StatefulCollector? Not sure if this makes sense though, since all Collector have State. We had a similar issue with Schedule when we originally designed it, but don't think we can take the same/similar approach to solve this issue.

This is just a nit, feel free to ignore. For me it's also fine without any changes, just wanted to leave some thoughts.

@nomisRev
Copy link
Member

Going to merge. Feel free to discuss further on my nit remark, or to ignore completely ☺️

@nomisRev nomisRev merged commit 85d2684 into main Jan 16, 2024
11 checks passed
@nomisRev nomisRev deleted the serras/collectors branch January 16, 2024 13:13
serras added a commit to arrow-kt/arrow-website that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants