Row and Column Iterator for Matrices #29749

arnavs · 2018-10-21T04:07:45Z

Talked in Slack about creating eachrow(m::Matrix) and eachcolumn(m::Matrix) which return an iterator over the rows and columns of M. Here's a stab at it.

stdlib/LinearAlgebra/src/LinearAlgebra.jl

fredrikekre · 2018-10-21T06:41:30Z

See #14491

arnavs · 2018-10-21T17:55:03Z

Thanks @fredrikekre, I updated the naming convention to match what was happening in that thread (eachrow and eachcol).

Also, what's going on with CI? Looks like we have two errors in something called sockets for 2 of the builds?

andyferris · 2018-10-22T04:16:42Z

Cool, nice, I was hoping we would implement something like this soon!

This one is a little close to my heart. For this functionality, I was hoping we could be slightly more generic - by being able to slice up arrays of any dimensionality by any one (or more) dimension.

Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged AbstractArrays in their own right. That is, eachrow of a matrix could return a nested vector of vectors. This would let users interact with the result in ways beyond a simple for loop. For example, the presence of the indices makes it easier to match up multiple containers which share indices along certain dimensions. Finally, different array types may have their own implementations of map, reduce and so-on which is quite different to a sequential for loop over all elements (consider for example doing something in parallel with the rows of a DMatrix from DistributedArrays).

Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions splitdims and splitdimsview (the latter being lazy). My intention here was always to develop these in a package and hopefully move these to Base through a PR here for Julia 1.x. (It should be noted that I don't particularly like my function names, but they are at least descriptive placeholders for now).

If I may be slightly philosophical - in my view, it seems more productive and flexible to let users put together different "split", "apply" and "combine" strategies like lego bricks, rather than to rely on increasingly complex functions like mapslices to do a split-apply-combine strategy all in one step, which is why I'm advocating for this in general, and excited by this PR in particular.

Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged AbstractArray.

andyferris · 2018-10-22T04:17:41Z

Cool, nice, I was hoping we would implement something like this soon!

This one is a little close to my heart. For this functionality, I was hoping we could be slightly more generic - by being able to slice up arrays of any dimensionality by any one (or more) dimension.

Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged AbstractArrays in their own right. That is, eachrow of a matrix could return a nested vector of vectors. This would let users interact with the result in ways beyond a simple for loop. For example, the presence of the indices makes it easier to match up multiple containers which share indices along certain dimensions. Finally, different array types may have their own implementations of map, reduce and so-on which is quite different to a sequential for loop over all elements (consider for example doing something in parallel with the rows of a DMatrix from DistributedArrays).

Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions splitdims and splitdimsview (the latter being lazy). My intention here was always to develop these in a package and hopefully move these to Base through a PR here for Julia 1.x. (It should be noted that I don't particularly like my function names, but they are at least descriptive placeholders for now).

If I may be slightly philosophical - in my view, it seems more productive and flexible to let users put together different "split", "apply" and "combine" strategies like lego bricks, rather than to rely on increasingly complex functions like mapslices to do a split-apply-combine strategy all in one step, which is why I'm advocating for this in general, and excited by this PR in particular.

Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged AbstractArray.

andyferris · 2018-10-22T04:18:26Z

Cool, nice, I was hoping we would implement something like this soon!

This one is a little close to my heart. For this functionality, I was hoping we could be slightly more generic - by being able to slice up arrays of any dimensionality by any one (or more) dimension.

Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged AbstractArrays in their own right. That is, eachrow of a matrix could return a nested vector of vectors. This would let users interact with the result in ways beyond a simple for loop. For example, the presence of the indices makes it easier to match up multiple containers which share indices along certain dimensions. Finally, different array types may have their own implementations of map, reduce and so-on which is quite different to a sequential for loop over all elements (consider for example doing something in parallel with the rows of a DMatrix from DistributedArrays).

Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions splitdims and splitdimsview (the latter being lazy). My intention here was always to develop these in a package and hopefully move these to Base through a PR here for Julia 1.x. (It should be noted that I don't particularly like my function names, but they are at least descriptive placeholders for now).

If I may be slightly philosophical - in my view, it seems more productive and flexible to let users put together different "split", "apply" and "combine" strategies like lego bricks, rather than to rely on increasingly complex functions like mapslices to do a split-apply-combine strategy all in one step, which is why I'm advocating for this in general, and excited by this PR in particular.

Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged AbstractArray.

andyferris · 2018-10-22T04:19:34Z

Cool, nice, I was hoping we would implement something like this soon!

This one is a little close to my heart. For this functionality, I was hoping we could be slightly more generic - by being able to slice up arrays of any dimensionality by any one (or more) dimension.

Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged AbstractArrays in their own right. That is, eachrow of a matrix could return a nested vector of vectors. This would let users interact with the result in ways beyond a simple for loop. For example, the presence of the indices makes it easier to match up multiple containers which share indices along certain dimensions. Finally, different array types may have their own implementations of map, reduce and so-on which is quite different to a sequential for loop over all elements (consider for example doing something in parallel with the rows of a DMatrix from DistributedArrays).

Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions splitdims and splitdimsview (the latter being lazy). My intention here was always to develop these in a package and hopefully move these to Base through a PR here for Julia 1.x. (It should be noted that I don't particularly like my function names, but they are at least descriptive placeholders for now).

If I may be slightly philosophical - in my view, it seems more productive and flexible to let users put together different "split", "apply" and "combine" strategies like lego bricks, rather than to rely on increasingly complex functions like mapslices to do a split-apply-combine strategy all in one step, which is why I'm advocating for this in general, and excited by this PR in particular.

Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged AbstractArray.

andyferris · 2018-10-22T05:21:23Z

Cool, nice, I was hoping we would implement something like this soon!

This one is a little close to my heart. For this functionality, I was hoping we could be slightly more generic - by being able to slice up arrays of any dimensionality by any one (or more) dimension.

Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged AbstractArrays in their own right. That is, eachrow of a matrix could return a nested vector of vectors. This would let users interact with the result in ways beyond a simple for loop. For example, the presence of the indices makes it easier to match up multiple containers which share indices along certain dimensions. Finally, different array types may have their own implementations of map, reduce and so-on which is quite different to a sequential for loop over all elements (consider for example doing something in parallel with the rows of a DMatrix from DistributedArrays).

Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions splitdims and splitdimsview (the latter being lazy). My intention here was always to develop these in a package and hopefully move these to Base through a PR here for Julia 1.x. (It should be noted that I don't particularly like my function names, but they are at least descriptive placeholders for now).

If I may be slightly philosophical - in my view, it seems more productive and flexible to let users put together different "split", "apply" and "combine" strategies like lego bricks, rather than to rely on increasingly complex functions like mapslices to do a split-apply-combine strategy all in one step, which is why I'm advocating for this in general, and excited by this PR in particular.

Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged AbstractArray.

andyferris · 2018-10-22T13:00:01Z

(Github had problems earlier today... looks like they posted all my retries!)

joshday · 2018-10-22T14:21:32Z

I'll add that I've also implemented eachrow/eachcol in OnlineStatsBase.

The difference from the implementation in this PR is that OnlineStatsBase uses a buffer instead of views (I think views were a little slower, but that could be investigated further). The code is here: https://github.com/joshday/OnlineStatsBase.jl/blob/0449a586f9229f6bf43a2214b11a271318a240a4/src/OnlineStatsBase.jl#L84-L142.

arnavs · 2018-10-22T17:59:04Z

Replies to feedback (first @andyferris, then @joshday):

Fully agree, would love if the slicing API was n-dimensional.
Additionally, since arrays naturally support random-access, it's also possible to turn the "iterator"s into a fully-fledged AbstractArrays in their own right. That is, eachrow of a matrix could return a nested vector of vectors.

For sure, we could implement rows(M) = collect(eachrow(M)) or something similar. Is that what you had in mind? (Edit: I still think it's valuable to keep the lightweight view iterator for instances that aren't conducive to reproducing all the data.)
Anyway, because I've been interested in this for a little while, I implemented such a nested array dimension splitting in a package called SplitApplyCombine under the functions splitdims and splitdimsview

Sounds good, I'll check this out.
Anyway, if @arnavs and the community are at all interested, I'd be happy to help expand the functionality here to work on arbitrary dimensions and return a fully-fledged AbstractArray.

For sure. Maybe if people think this is a good idea (cc: @ararslan, @StefanKarpinski, @fredrikekre), we can merge this and then implement a method eachslice(A::AbstractArray)? Or, if we need to workshop this a bit, that's OK too.
I'll add that I've also implemented eachrow/eachcol in OnlineStatsBase...I think views were a little slower...

Quite possible, but that's above my pay grade. I'll let others weigh in.

StefanKarpinski · 2018-10-22T22:06:50Z

@mbauman, can you take a look at this?

mbauman · 2018-10-22T22:38:00Z

I agree with most all of what @andyferris wrote above, except that I'd suggest we start as simple as possible. I'll also note that there's another package implementation in JuliennedArrays.jl.

I'd suggest simply:

eachrow(A::AbstractArray) = (view(A, i, :) for i in axes(A, 1))
eachcol(A::AbstractArray) = (view(A, :, j) for j in axes(A, 2))

For the general case, it looks like constant propagation can almost handle:

eachslice(A, d) = (view(A, ntuple(n->n==d ? i : (:), ndims(A))...) for i in axes(A, d))

but we're not quite there yet. It'll take a bit more to make this type-stable.

Edit: oh, of course, we can just use selectdim:

eachslice(A, d) = (selectdim(A,

mbauman

I'd really like to see this as a more general structure, but I've added a few pedagogical comments based on the code you've written that I hope will be helpful.

stdlib/LinearAlgebra/src/generic.jl

mbauman · 2018-10-22T23:15:00Z

buffer instead of views

I think I prefer the views behavior. Yes, buffers can be faster in many situations, but they also can have strange effects if you, e.g., collect the iterator. I prefer the view behaviors, and there's hope that someday they'll be faster (#14955).

andyferris · 2018-10-22T23:45:27Z

we can merge this and then implement a method eachslice(A::AbstractArray)

except that I'd suggest we start as simple as possible

I agree - let's do baby steps! :)

My concern is more on the bikeshedding front - that a function name such as eachrow is great for an iterator (for row in eachrow(matrix) is indeed very readable), but IMO not so great for a general function which transforms an array into another array. If we know the function will be expanded to included more functionality later, we may as well rename it now to reduce churn.

Our typical policy is to pick a simple English verb for Julia functions (and I don't see why we would deviate from that). Ideas that come to mind include slice, split, splitdims.

I'll also note that there's another package implementation in JuliennedArrays.jl.

Yes, I meant to come back and acknowledge JuliennedArrays (sorry @bramtayl). While I am talking function names - while I really appreciate the julienne play on words (genius!), for Julia Base perhaps a simpler English verb might be preferrable for new users (especially for those whose mother tongue isn't English)?

nalimilan · 2018-12-01T22:01:20Z

Honestly I don't think adding unexported functions in one release will help people to fix their code in advance. Only people following Julia development closely will be aware of that, and these people are precisely the ones which will adapt before 1.1 is out anyway (like DataFrames). Others will only notice the breakage in 1.2 when we eventually export these symbols.

Regarding DataFrames, let's not take it as an argument to delay the addition of new exports. @bkamins has already made a PR to make the API more consistent with this PR, and we can merge a fix even before this PR is merged in Julia, and tag it in a few days.

arnavs · 2018-12-01T22:11:30Z

If everyone agrees that ‘eachslice’ is good to export, perhaps we could do that now and continue the discussion on the other two?

Not to put too fine a point on it, but I think we’ve reached the convergence point from this PR (the functionality we want is added, documented, tested, and won’t break things downstream). If I’m wrong, happy to iterate further, but this feature request has been open for years now and it would be nice to close.

mbauman · 2018-12-01T23:04:28Z

I would just very much like to find a conclusion here. @arnavs has been very tenacious, persistent and patient for this first PR. I leaned on the side of being conservative to help make that happen but apparently that backfired.

[ci skip]

nalimilan · 2018-12-02T13:33:57Z

AFAICT we all agree the PR can be merged. The debate is about whether we should add more exports or not. But if we don't reach an agreement soon better merge and continue the discussion after that.

bkamins · 2018-12-02T16:07:45Z

If some package used e.g. eachcol earlier (like DataFrames.jl) the maintainers will have to introduce a conditional definition of eachcol and Base.eachcol anyway to keep supporting Julia 1.0 (or handle this duality via Compat.jl) no matter if we export it in Julia 1.1 or 1.2.

AzamatB · 2018-12-02T19:43:18Z

My 2¢: I agree with @nalimilan and @bkamins and think there is no need to postpone exports to 1.2

[ci skip]

arnavs · 2018-12-02T23:33:08Z

Alright, I re-added the exports. Let's never think about this PR ever again :)

StefanKarpinski · 2018-12-03T18:00:22Z

@nalimilan or @mbauman, please merge if and when you think this is ready (with appropriate squashing).

mbauman · 2018-12-03T18:02:12Z

I'm thrilled DataFrames folks are on board here and will accommodate the breakage. Thanks everyone. Let's merge!

nalimilan · 2018-12-04T08:21:56Z

Thanks! DataFrames PR is JuliaData/DataFrames.jl#1614, will be tagged shortly.

…n `EachSlice` object (along with `EachRow`/`EachCol` aliases). The main benefit is that it will allow dispatch on the iterator to provide more efficient methods, e.g. ``` sum(A::EachRow) = vec(sum(parent(A), dims=1)) ``` This will encourage the use of `eachcol`/`eachrow` to resolve ambiguities in user-facing APIs, in particular, the "obsverations as rows vs columns" problem in the statistics/ML packages. This also makes `eachslice` work over multiple dimensions.

Arnav Sood added 6 commits October 20, 2018 20:12

first pass, sans tests

540ec66

add test

e770e72

refactor a bit

230b7fb

fix eltype to show views

bd7c666

fix whitespace

6efae6b

fix typo

912fc36

ararslan reviewed Oct 21, 2018

View reviewed changes

stdlib/LinearAlgebra/src/LinearAlgebra.jl Outdated Show resolved Hide resolved

Arnav Sood added 3 commits October 21, 2018 10:57

rename [ci skip]

9be6483

typofix [ci skip]

a17aec6

add eachslice() [ci skip]

4aa571e

arnavs mentioned this pull request Oct 21, 2018

enumerate() like equivalents for iterating over columns/rows of a matrix #14491

Closed

eachcolumn => eachcol and a bit of error handling [ci skip]

f467553

ararslan requested review from mbauman and timholy October 22, 2018 22:18

mbauman reviewed Oct 22, 2018

View reviewed changes

stdlib/LinearAlgebra/src/generic.jl Outdated Show resolved Hide resolved

stdlib/LinearAlgebra/src/generic.jl Outdated Show resolved Hide resolved

stdlib/LinearAlgebra/src/generic.jl Outdated Show resolved Hide resolved

feedback

d04fbcf

mbauman approved these changes Dec 1, 2018

View reviewed changes

Add missing backticks and improve message

6959465

[ci skip]

nalimilan approved these changes Dec 2, 2018

View reviewed changes

AzamatB approved these changes Dec 2, 2018

View reviewed changes

add exports for eachcol, eachrow

b3ced1a

[ci skip]

mbauman merged commit 6b04291 into JuliaLang:master Dec 3, 2018

mbauman added a commit that referenced this pull request Dec 3, 2018

NEWS for #29749

f91157c

arnavs mentioned this pull request Dec 3, 2018

Use new eachrow and eachcol QuantEcon/lecture-source-jl#467

Open

bkamins mentioned this pull request Dec 3, 2018

make eachcol default to false JuliaData/DataFrames.jl#1613

Closed

fredrikekre pushed a commit that referenced this pull request Dec 4, 2018

NEWS and compat annotation for each(row|col|slice) #29749 (#30245)

7920a2a

FelipeLema mentioned this pull request Dec 5, 2018

Iterate over rows (or columns) of a matrix JuliaCollections/IterTools.jl#11

Closed

mcabbott mentioned this pull request Dec 20, 2018

Gradients for prod, cumsum, cumprod FluxML/Flux.jl#524

Closed

mbauman mentioned this pull request Jan 4, 2019

Equivalent of mapslices with views? #29146

Closed

oxinabox mentioned this pull request Jan 9, 2019

eachrow, eachcol, eachslice support JuliaLang/Compat.jl#639

Closed

tpapp mentioned this pull request Jan 22, 2019

Make use of mapslices consistent throughout Julia #3893

Open

simonbyrne mentioned this pull request Jun 12, 2019

add Slices array type for eachslice/eachrow/eachcol #32310

Merged

1 task

jiegillet mentioned this pull request Jul 24, 2019

Added eachrow/col/slice for v1.0 JuliaLang/Compat.jl#658

Merged

tkf mentioned this pull request Oct 16, 2019

Add ASCII alias compose of ∘ #33573

Closed

nalimilan mentioned this pull request Mar 1, 2021

Usage of eachcol seems to be a pun? JuliaData/DataFrames.jl#2636

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Row and Column Iterator for Matrices #29749

Row and Column Iterator for Matrices #29749

arnavs commented Oct 21, 2018 •

edited

Loading

fredrikekre commented Oct 21, 2018

arnavs commented Oct 21, 2018 •

edited

Loading

andyferris commented Oct 22, 2018

andyferris commented Oct 22, 2018

andyferris commented Oct 22, 2018

andyferris commented Oct 22, 2018

andyferris commented Oct 22, 2018

andyferris commented Oct 22, 2018

joshday commented Oct 22, 2018

arnavs commented Oct 22, 2018 •

edited

Loading

StefanKarpinski commented Oct 22, 2018

mbauman commented Oct 22, 2018 •

edited

Loading

mbauman left a comment

mbauman commented Oct 22, 2018

andyferris commented Oct 22, 2018

nalimilan commented Dec 1, 2018

arnavs commented Dec 1, 2018

mbauman commented Dec 1, 2018 •

edited

Loading

nalimilan commented Dec 2, 2018

bkamins commented Dec 2, 2018 •

edited

Loading

AzamatB commented Dec 2, 2018 •

edited

Loading

arnavs commented Dec 2, 2018

StefanKarpinski commented Dec 3, 2018

mbauman commented Dec 3, 2018

nalimilan commented Dec 4, 2018

Row and Column Iterator for Matrices #29749

Row and Column Iterator for Matrices #29749

Conversation

arnavs commented Oct 21, 2018 • edited Loading

fredrikekre commented Oct 21, 2018

arnavs commented Oct 21, 2018 • edited Loading

andyferris commented Oct 22, 2018

andyferris commented Oct 22, 2018

andyferris commented Oct 22, 2018

andyferris commented Oct 22, 2018

andyferris commented Oct 22, 2018

andyferris commented Oct 22, 2018

joshday commented Oct 22, 2018

arnavs commented Oct 22, 2018 • edited Loading

StefanKarpinski commented Oct 22, 2018

mbauman commented Oct 22, 2018 • edited Loading

mbauman left a comment

Choose a reason for hiding this comment

mbauman commented Oct 22, 2018

andyferris commented Oct 22, 2018

nalimilan commented Dec 1, 2018

arnavs commented Dec 1, 2018

mbauman commented Dec 1, 2018 • edited Loading

nalimilan commented Dec 2, 2018

bkamins commented Dec 2, 2018 • edited Loading

AzamatB commented Dec 2, 2018 • edited Loading

arnavs commented Dec 2, 2018

StefanKarpinski commented Dec 3, 2018

mbauman commented Dec 3, 2018

nalimilan commented Dec 4, 2018

arnavs commented Oct 21, 2018 •

edited

Loading

arnavs commented Oct 21, 2018 •

edited

Loading

arnavs commented Oct 22, 2018 •

edited

Loading

mbauman commented Oct 22, 2018 •

edited

Loading

mbauman commented Dec 1, 2018 •

edited

Loading

bkamins commented Dec 2, 2018 •

edited

Loading

AzamatB commented Dec 2, 2018 •

edited

Loading