Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards array nirvana #7941

Closed
12 of 15 tasks
timholy opened this issue Aug 10, 2014 · 78 comments
Closed
12 of 15 tasks

Towards array nirvana #7941

timholy opened this issue Aug 10, 2014 · 78 comments
Milestone

Comments

@timholy
Copy link
Member

timholy commented Aug 10, 2014

For 0.4 several of us want to see significant improvements in arrays. I'm opening this meta-issue to help organize the effort. I'll be brief here and link out to issues/packages for more detailed explanations. Please add to these bullet points.

Underlying technologies

The first two are essential, the third is nearly essential.

I'm guessing the approach for implementing #7799 will also allow one to specify manual inlining (which is where the idea was originally proposed), which is the main bottleneck for #6437. So that's almost a 2-for-1 deal.

Implementation

The first two are essential.

Opportunities

ArrayViewsAPL is something I've not yet broadly announced. While the APL in the package's title is in reference to #5949 (for which it can be a test bed to see how we'd like it and how much would break), its real purpose is to exploit stagedfunctions for creating efficient and general view operations. Please see the README for explanations.

@timholy
Copy link
Member Author

timholy commented Aug 10, 2014

Another fun ingredient in my own personal array nirvana will be https://github.com/timholy/NamedAxesArrays.jl. More joy from stagedfunctions 😄. But I don't propose that for inclusion in base.

@lindahua
Copy link
Contributor

@timholy the contiguous rank of StridedView is important, consider:

v1 = view(a,:,1:2:7)
v2 = view(v1,:,2)

With the type system in ArrayViews, it can determine statically that v2 is a contiguous view.

@timholy
Copy link
Member Author

timholy commented Aug 10, 2014

I see, we probably want both then.

@lindahua
Copy link
Contributor

The StridedView part is tightly integrated with ContiguousView. It is possible that lower rank slices of a non-contiguous Strided view is contiguous. The ArrayViews type system preserves this piece of information, so that something above can be determined statically.

@timholy
Copy link
Member Author

timholy commented Aug 10, 2014

Although how one decides which one wants is an interesting question---I guess we could return your ArrayView types when the parent is an array and no slicing is desired, and the ArrayViewAPL view type otherwise?

@Jutho
Copy link
Contributor

Jutho commented Aug 10, 2014

Looking forward to 0.4 already...

@timholy
Copy link
Member Author

timholy commented Aug 10, 2014

Jutho, you've contributed a lot already, and it would be lovely to have your help implementing some of this!

@jtravs
Copy link
Contributor

jtravs commented Aug 10, 2014

@timholy ArrayViewAPL is what I've been dreaming of since discovering Julia! Hopefully sliceview can become the standard indexing semantics. I'll be happy to test and help in any way I can once this lands in master (or is usable without having to jump over lots of branch fences)!

@Jutho
Copy link
Contributor

Jutho commented Aug 10, 2014

@timholy I’d be happy to help when time permits. Let me know if you had anything specific in mind that I can contribute to.

On 10 Aug 2014, at 14:15, Tim Holy notifications@github.com wrote:

Jutho, you've contributed a lot already, and it would be lovely to have your help implementing some of this!


Reply to this email directly or view it on GitHub.

@jiahao
Copy link
Member

jiahao commented Aug 10, 2014

+1 for named axes arrays

@timholy
Copy link
Member Author

timholy commented Aug 10, 2014

@lindahua, I added "Move Colon translation out of the parser" because presumably for ArrayViews that will be a critical step for A[:, i] to work as you're intending.

@JeffBezanson
Copy link
Member

Great list!

Just to get a broader context, we also want more efficient strings (and possibly BigInts), which also require any-length, immutable homogeneous arrays. Similar enough to tuples to make one think. Instead of mutable fixed-size arrays, we might want to use the "assignable cell" model, where you have a mutable cell that can hold a single value. Then you just store different array values into it at different times. Local mutations of single elements can be optimized.

@JeffBezanson JeffBezanson modified the milestones: 0.4, 0.4-projects Aug 10, 2014
@lindahua
Copy link
Contributor

@timholy Thanks for taking the lead on this.

I don't have particular preference as to whether to use ArrayViews or ArrayViewsAPL or a hybrid of both, or something afresh, as long as it is efficient enough for common cases.

I think there are two issues here can be discussed separately.

  1. Semantics (behaviors): we have to specify clearly what certain expressions actually mean. For example:

    • should a[i, :] be a (column) vector or a matrix with one row (when i is a scalar)?
    • should we perform bounds checking for i upon construction of the view?
    • should a[[1,3,7,12]] returns a view or a copy?

    I think we should focus on making decision for such questions (and probably a few others) first.

  2. Once we have a clear specification of the behavior, we can then proceed to work on an efficient implementation, which can be based on either ArrayViews, ArrayViewsAPL, or borrow things from both. The goal would be type stability and efficiency (of compilation, construction, and getindex).

@lindahua
Copy link
Contributor

I remember this has been brought up a while ago in some issue. But it would be useful to mention it here. It can be useful to support syntax like:

a[..., i]
# so it means a[:,i] when ndims(a) == 2
# and a[:,:,i] when ndims(a) == 3, etc

@timholy
Copy link
Member Author

timholy commented Aug 10, 2014

Those are good questions. What's funny is that I am not actually sure they all need to be settled first. For example, in ArrayViewsAPL I've implemented both the current sub behavior and the current slice behavior. While they have drastically different implications for user code, the two functions differ by only a couple of lines.

Regarding a[[1,3,7,12]], at least for ArrayViewsAPL, it may just be a question of changing RangeIndex in the type declaration to AbstractVector. Whether we want this is, of course, a good question. I've gone back and forth on that. My personal thinking currently is that we probably want the option for a view of that type, but I wonder if we want that to be accessible only through view(a, [1,3,7,12]).

Bounds-checking upon construction is probably more important to settle early (this is #4044). I now suspect that we may need to have the base types check bounds upon construction. People who want to pull the tricks in #4044 might need to implement an AbstractArray type that doesn't check upon construction but has a getindex that always checks bounds, whether inside @inbounds or not (unless they really like to live dangerously, that is).

Overall, my view is that the real effort here is in the "underlying technologies" section of the list. I'd include the changes to the parser in this. The rest of the core "view" infrastructure (ignoring fixed-size arrays, etc) seems likely to be something one could finish banging out in a couple of days. Writing tests and dealing with the consequences will take rather longer.

@JeffBezanson
Copy link
Member

We can also consider whether we want more kinds of indexes, such as #1032. This is important since it may ask for a more general approach to Colon than just making it a special case.

@timholy
Copy link
Member Author

timholy commented Aug 10, 2014

Good thought. I think stagedfunctions will make it a lot easier to support a diversity of indexes. Prior to stagedfunctions, we'd have had to worry about what to do if argument 1 was a Colon, argument 2 was a NegatedIndex, and argument 3 was a Vector{Int}, and whether we can handle that type-diversity efficiently in a single function. Now we just write one function:

stagedfunction someindexingfunction(A, indexes...)
    # ... optionally do a whole bunch of "common" stuff at the beginning
    if indexes[i] <: Colon
        mungedindexexprs[i] = :(1:size(A, $i))
    elseif indexes[i] <: NegatedIndex
        mungedindexexprs[i] = :(setdiff(1:size(A, $i), indexes[$i]))
    elseif ...
    end
    # ...more common work if you need to...
    # And here's what actually runs:
    :(A($(mungedindexexprs...))
end

With a single function we can basically handle anything we want to support, and all the tricky work is done at compile time by straightforward Julia code. You couldn't write something more runtime efficient by hand if you tried. It's a total game-changer.

@JeffBezanson
Copy link
Member

I share your enthusiasm for staged functions, but not for if statements checking types :)

@timholy
Copy link
Member Author

timholy commented Aug 10, 2014

Fair enough. mungedindexexprs[i] = rewriteindex(:A, :indexes, i, indexes[i]).

@lindahua
Copy link
Contributor

I will try to rewrite ArrayViews with staged functions and see how much it may simplify codes.

@lindahua
Copy link
Contributor

Still, critical information like whether views are contiguous should be encoded by the type itself (this is essential for type stability).

@quinnj
Copy link
Member

quinnj commented Aug 11, 2014

Can we add #2488 and #3737 to the list?

@timholy
Copy link
Member Author

timholy commented Aug 11, 2014

I see that "collaborator" tag, so feel free to add them to the bullet list.

@StefanKarpinski
Copy link
Member

While we're at this array overhaul, can we try to make it so that switching between APL and Matlab style indexing is relatively simple? I.e. Just a matter of surface syntax.

@timholy
Copy link
Member Author

timholy commented Aug 11, 2014

That was my idea with ArrayViewsAPL: just choose

getindex(A, ...) = subview(A, ...)

or

getindex(A, ...) = sliceview(A, ...)

Seemed the safest way given that we don't yet know what we want.

@jakebolewski
Copy link
Member

Since these enhancements are going to break most array code anyway, what is the status of {} arrays going forward? The syntax seems like an historical vestige. I feel it would be conceptually simpler to have just one array syntax.

@ViralBShah
Copy link
Member

I have been following from a distance, but I am wondering if we are any closer now than 3 months ago. Quite a bit has happened with staged functions, sub arrays and cartesian iterators since then, which makes me hopeful that we are closer to replacing array indexing with views. Is it likely we can achieve this in 0.4?

@andreasnoack
Copy link
Member

Some and hopefully most of the work should be done in #9150. The main issue is to handle Colon within Julia instead of in the parser. I've talk with @jakebolewski about fixing that.

@timholy
Copy link
Member Author

timholy commented Feb 1, 2015

In a sense this is already done, other than syntax (meaning A[:,j] gets replaced by slice(A, :, j)).

But it's still a little scary given "deep" bugs in stagedfunctions, #8504 and #8553#8853. If those had been fixed, we surely would have had this functionality months ago.

@timholy
Copy link
Member Author

timholy commented Feb 1, 2015

Oh, and I forgot perhaps even the most important one: it would be insane to turn A[:,j] into slice(A, :, j) until #8227 (or alternate implementation) gets merged.

EDIT: Demo:

julia> A = reshape(1:15, 3, 5)
3x5 Array{Int64,2}:
 1  4  7  10  13
 2  5  8  11  14
 3  6  9  12  15

julia> b = slice(A, 1:2, 1)
2-element SubArray{Int64,1,Array{Int64,2},(UnitRange{Int64},Int64),2}:
 1
 2

julia> b[3]
3

julia> b = sub(A, 1, :)
1x5 SubArray{Int64,2,Array{Int64,2},(UnitRange{Int64},Colon),2}:
 1  4  7  10  13

julia> b[2,2]
5

@jiahao
Copy link
Member

jiahao commented Feb 1, 2015

@timholy I think you meant some other issue besides #8553

@timholy
Copy link
Member Author

timholy commented Feb 1, 2015

Indeed I did, thanks for catching. (Corrected above.)

@vtjnash vtjnash modified the milestones: 0.5, 0.4 Mar 7, 2015
@ViralBShah
Copy link
Member

It looks like much of this will make its way into 0.4. Should we refactor this issue into the bits that we want in 0.4, and a separate issue for what will be in 0.5?

@timholy
Copy link
Member Author

timholy commented Apr 23, 2015

Indeed, the large majority has been in 0.4 for some time---that's what all the checkmarks at the top are for 😄.

If you want to split out separate issues for the remaining items, I'm fine with that, but I don't really see the point.

@mschauer
Copy link
Contributor

@lindahua Regarding your remark in the beginning, the issue on ... was #5405

@timholy
Copy link
Member Author

timholy commented Apr 23, 2015

#10525 was long overdue for a spot on that list, so I added it.

@ViralBShah
Copy link
Member

I was just initially puzzled at seeing the 0.5 milestone. We can leave it as it is.

@ViralBShah
Copy link
Member

The question I really had was - from a 0.4 perspective, which of the unchecked ones we want to get done, or are we already there?

@blakejohnson
Copy link
Contributor

I think we should still aim for extensible bounds checking removal, #7799, in 0.4.

@timholy
Copy link
Member Author

timholy commented Apr 23, 2015

I was puzzled about the 0.5 milestone too. I wondered if you had changed it (I didn't). That said, I think we won't get most of the remaining items by 0.4, so bumping the milestone seems reasonable.

Realistically, I expect us to check only one more box: I think we will get "sweet custom array types" (aka #10525, hopefully supported by #10911 if I can get my act together in time). This is a really big step---one that I did not anticipate at the time I opened this issue---and it will be amazing to have. That would also open the way for fixing reshape (#10507), but I'd be a little reluctant to add that kind of functionality at a late date. I just tagged that PR as 0.5 to clarify my current thoughts about it.

Re "return views from indexing": I think the big show-stopper is #7799, since our views don't check bounds (in my opinion, we'd have a catastrophic loss of safety if we switched to returning views before addressing that issue). It's also worth reminding folks that @carlobaldassi has raised a number of concerns specifically about BitArrays that, to my knowledge, we haven't really addressed (see posts higher up in this issue). However, if we can satisfy ourselves about these two issues, then there's a lot that can happen relatively quickly (#10331, and returning views from indexing).

@pao
Copy link
Member

pao commented Apr 23, 2015

I wondered if you had changed it

In case you hadn't seen it, this is in the issue's log, squeezed between comments--it was changed on March 7 by @vtjnash.

@StefanKarpinski
Copy link
Member

@vtjnash went around and randomized all the milestones :-|

@mbauman
Copy link
Member

mbauman commented Apr 23, 2015

I think the big show-stopper is #7799, since our views don't check bounds (in my opinion, we'd have a catastrophic loss of safety if we switched to returning views before addressing that issue).

I'm looking at this right now in the context of #10525. I think I may be able to rectify this since it puts unsafe_getindex on firmer ground. It sure would be a lot nicer with #7799, though.

(SubArray almost checks bounds right now, though. It does so upon construction, and then it indexes into the parent array with bounds checks on. The trouble is that in a multidimensional indexing expression, one of the indices may be out of bounds and it may return spurious data. Edit: Ah, I see now. It doesn't check bounds in resolving its indices, either. Same solution applies, though.)

@carlobaldassi
Copy link
Member

a number of concerns specifically about BitArrays that, to my knowledge, we haven't really addressed (see posts higher up in this issue).

To get back on that, my current thinking is that a lot of methods should be specialized to work on BitArray Views efficiently. It's going to take time -- and to increase test/bitarray.jl execution time considerably -- but in the end I think it's also going to improve performance in most common cases by allowing to avoid temporaries (e.g. sum(B[a:b]) and similar).

BTW I'm assuming that in most relevant cases views of views can be flattened, otherwise this is probably not going to work.

The only issue which I don't think we can really solve is treating efficiently views with generic, non contiguous indexing. However, this is only going to make a real difference when indexing repeatedly into the view, and maybe for nested views.

@prcastro
Copy link
Contributor

prcastro commented Jun 4, 2015

I believe the last two items could be closed

@mbauman
Copy link
Member

mbauman commented Sep 15, 2015

Superseded by JuliaLang/LinearAlgebra.jl#255. If I missed anything from this issue, please add it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests