-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Row and Column Iterator for Matrices #29749
Conversation
See #14491 |
Thanks @fredrikekre, I updated the naming convention to match what was happening in that thread ( Also, what's going on with CI? Looks like we have two errors in something called |
Cool, nice, I was hoping we would implement something like this soon! This one is a little close to my heart. For this functionality, I was hoping we could be slightly more generic - by being able to slice up arrays of any dimensionality by any one (or more) dimension. Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions If I may be slightly philosophical - in my view, it seems more productive and flexible to let users put together different "split", "apply" and "combine" strategies like lego bricks, rather than to rely on increasingly complex functions like Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged |
4 similar comments
Cool, nice, I was hoping we would implement something like this soon! This one is a little close to my heart. For this functionality, I was hoping we could be slightly more generic - by being able to slice up arrays of any dimensionality by any one (or more) dimension. Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions If I may be slightly philosophical - in my view, it seems more productive and flexible to let users put together different "split", "apply" and "combine" strategies like lego bricks, rather than to rely on increasingly complex functions like Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged |
Cool, nice, I was hoping we would implement something like this soon! This one is a little close to my heart. For this functionality, I was hoping we could be slightly more generic - by being able to slice up arrays of any dimensionality by any one (or more) dimension. Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions If I may be slightly philosophical - in my view, it seems more productive and flexible to let users put together different "split", "apply" and "combine" strategies like lego bricks, rather than to rely on increasingly complex functions like Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged |
Cool, nice, I was hoping we would implement something like this soon! This one is a little close to my heart. For this functionality, I was hoping we could be slightly more generic - by being able to slice up arrays of any dimensionality by any one (or more) dimension. Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions If I may be slightly philosophical - in my view, it seems more productive and flexible to let users put together different "split", "apply" and "combine" strategies like lego bricks, rather than to rely on increasingly complex functions like Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged |
Cool, nice, I was hoping we would implement something like this soon! This one is a little close to my heart. For this functionality, I was hoping we could be slightly more generic - by being able to slice up arrays of any dimensionality by any one (or more) dimension. Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions If I may be slightly philosophical - in my view, it seems more productive and flexible to let users put together different "split", "apply" and "combine" strategies like lego bricks, rather than to rely on increasingly complex functions like Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged |
(Github had problems earlier today... looks like they posted all my retries!) |
I'll add that I've also implemented The difference from the implementation in this PR is that OnlineStatsBase uses a buffer instead of views (I think views were a little slower, but that could be investigated further). The code is here: https://github.com/joshday/OnlineStatsBase.jl/blob/0449a586f9229f6bf43a2214b11a271318a240a4/src/OnlineStatsBase.jl#L84-L142. |
Replies to feedback (first @andyferris, then @joshday):
|
@mbauman, can you take a look at this? |
I agree with most all of what @andyferris wrote above, except that I'd suggest we start as simple as possible. I'll also note that there's another package implementation in JuliennedArrays.jl. I'd suggest simply: eachrow(A::AbstractArray) = (view(A, i, :) for i in axes(A, 1))
eachcol(A::AbstractArray) = (view(A, :, j) for j in axes(A, 2)) For the general case, it looks like constant propagation can almost handle: eachslice(A, d) = (view(A, ntuple(n->n==d ? i : (:), ndims(A))...) for i in axes(A, d)) but we're not quite there yet. It'll take a bit more to make this type-stable. Edit: oh, of course, we can just use eachslice(A, d) = (selectdim(A, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to see this as a more general structure, but I've added a few pedagogical comments based on the code you've written that I hope will be helpful.
I think I prefer the views behavior. Yes, buffers can be faster in many situations, but they also can have strange effects if you, e.g., |
I agree - let's do baby steps! :) My concern is more on the bikeshedding front - that a function name such as Our typical policy is to pick a simple English verb for Julia functions (and I don't see why we would deviate from that). Ideas that come to mind include
Yes, I meant to come back and acknowledge JuliennedArrays (sorry @bramtayl). While I am talking function names - while I really appreciate the |
Honestly I don't think adding unexported functions in one release will help people to fix their code in advance. Only people following Julia development closely will be aware of that, and these people are precisely the ones which will adapt before 1.1 is out anyway (like DataFrames). Others will only notice the breakage in 1.2 when we eventually export these symbols. Regarding DataFrames, let's not take it as an argument to delay the addition of new exports. @bkamins has already made a PR to make the API more consistent with this PR, and we can merge a fix even before this PR is merged in Julia, and tag it in a few days. |
If everyone agrees that ‘eachslice’ is good to export, perhaps we could do that now and continue the discussion on the other two? Not to put too fine a point on it, but I think we’ve reached the convergence point from this PR (the functionality we want is added, documented, tested, and won’t break things downstream). If I’m wrong, happy to iterate further, but this feature request has been open for years now and it would be nice to close. |
I would just very much like to find a conclusion here. @arnavs has been very tenacious, persistent and patient for this first PR. I leaned on the side of being conservative to help make that happen but apparently that backfired. |
AFAICT we all agree the PR can be merged. The debate is about whether we should add more exports or not. But if we don't reach an agreement soon better merge and continue the discussion after that. |
If some package used e.g. |
My 2¢: I agree with @nalimilan and @bkamins and think there is no need to postpone exports to 1.2 |
[ci skip]
Alright, I re-added the exports. Let's never think about this PR ever again :) |
@nalimilan or @mbauman, please merge if and when you think this is ready (with appropriate squashing). |
I'm thrilled DataFrames folks are on board here and will accommodate the breakage. Thanks everyone. Let's merge! |
Thanks! DataFrames PR is JuliaData/DataFrames.jl#1614, will be tagged shortly. |
…n `EachSlice` object (along with `EachRow`/`EachCol` aliases). The main benefit is that it will allow dispatch on the iterator to provide more efficient methods, e.g. ``` sum(A::EachRow) = vec(sum(parent(A), dims=1)) ``` This will encourage the use of `eachcol`/`eachrow` to resolve ambiguities in user-facing APIs, in particular, the "obsverations as rows vs columns" problem in the statistics/ML packages. This also makes `eachslice` work over multiple dimensions.
Talked in Slack about creating
eachrow(m::Matrix)
andeachcolumn(m::Matrix)
which return an iterator over the rows and columns ofM
. Here's a stab at it.