Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Buffer type and make Array an abstraction on top of it #12447

Closed
malmaud opened this issue Aug 4, 2015 · 41 comments
Closed

Introduce Buffer type and make Array an abstraction on top of it #12447

malmaud opened this issue Aug 4, 2015 · 41 comments
Labels
arrays [a, r, r, a, y, s]
Milestone

Comments

@malmaud
Copy link
Contributor

malmaud commented Aug 4, 2015

There's been some talk of switching to a model for arrays with two parts (and similar for strings) as part of Arrrayageddon:

  1. A memory buffer type, which is just a linear set of bytes
  2. An array, which holds a reference to its backing memory buffer and stores shape and stride information.

A few advantages:

  1. The distinction between and a subarray becomes smaller (or disappears): both are just views on a memory buffer.
  2. Array logic could be totally moved from C to Julia - only the buffer type would be defined in C.
  3. It's easier to reason about when different array instances are aliasing memory - if they have the same backing buffer, they're aliased.
  4. You could dynamically change the backing buffer of an array.

This is essentially the model that Torch uses, where the memory buffer is the Storage type and the array abstraction is the Tensor type.

@quinnj
Copy link
Member

quinnj commented Aug 4, 2015

+100. To expound on the conversation between @malmaud, @StefanKarpinski, @carnaval, and myself.

  • Buffer would be a fixed-size linear byte buffer tied closely to actual memory
  • Array and String, for example, would be interface layers on top of these byte Buffer types
  • We could also do a custom allocator for Buffer and perhaps specialized GC (like ref counts on individual elements instead of the entire buffer/page)
  • IOBuffer could also be reworked to use this low-level Buffer type for tighter IO control
  • We didn't really come to a final conclusion, but there could perhaps be an integration with NTuple here that would create a MutableBuffer and ImmutableBuffer types that share a common interface (related to Julep: More support for working with immutables #11902)

@ScottPJones
Copy link
Contributor

+100 also, love that it would fit in perfectly with what @quinnj has been working on in https://github.com/quinnj/Strings.jl.
Would it be possible for short buffers to be "inline" instead of pointed to, i.e. like @StefanKarpinski's bytevec stuff?

@JeffBezanson
Copy link
Member

We should definitely start experimenting with an Array defined mostly in julia, mixing in some SubArray features.

How should we handle arrays of references? I don't think we want to have explicit write barriers in julia code for example.

If the storage of an Array can be switched at any time, you have extra pointer loads to worry about, and aliasing issues.

Another issue is in-line allocation of small arrays. Ideally arrays and strings would not mandate 2 objects for each.

@timholy
Copy link
Member

timholy commented Aug 4, 2015

The distinction between and a subarray becomes smaller (or disappears): both are just views on a memory buffer

Small correction: that's only true for SubArray{T,N,Array{T,...}}. But our SubArrays are much more flexible than that. We can have views on things that don't even exist in memory (e.g., a view on (i,j)->i^2+j^2, if we want).

That small correction aside, I am enthusiastic about this proposal, including mixing in some features of SubArrays into an Array-as-buffer-wrapper type. A big advantage is that FixedSizeArrays become much easier to implement because we'd finally have what's essentially a mutable Tuple.

However, I'm not so sure about switching the backing buffer of the array, and I think Jeff's points deserve careful thought.

@ScottPJones
Copy link
Contributor

@malmaud, what use cases do you see for dynamically switching the backing buffer of the array?
I can see a lot of uses for having all sorts of backing buffers, but dynamic switching sounds scary.

@quinnj
Copy link
Member

quinnj commented Aug 4, 2015

Yeah, I'm not too sure on the need to be able to switch out backing buffers. I mean I would say we just make

immutable Array
  ...
end

Let's see, what else do we currently store in Array, element-size? length? is-shared? is-aligned? dimensions? Would we really need the ability to change any of those fields? I would think since Array is now a wrapper type, making it immutable would work since it's fairly cheap to create a new instance pointing to the same data (for example)

@shashi shashi closed this as completed Aug 4, 2015
@shashi shashi reopened this Aug 4, 2015
@simonster
Copy link
Member

Right now we allow changing data pointers for unshared vectors, which we use when we have to reallocate, i.e. for push!. But we don't allow changing data pointers for other ND arrays, and the compiler knows that and can hoist array pointer loads in more cases. It seems difficult to maintain these semantics without splitting Vector and Array into two separate types, since we'd need to be able to control mutability of the type based on the number of dimensions. OTOH it may turn out that with appropriate TBAA etc. we can hoist the pointer loads in the cases where it matters for performance even if the storage can be changed at any time.

@ViralBShah
Copy link
Member

This is very timely for 0.5. We used to have a Buffer in the very early days, before the Julia public release, and it was quite nice to be able to define Array itself in Julia.

@malmaud
Copy link
Contributor Author

malmaud commented Aug 4, 2015

I don't personally see much use for switching out the buffer - I think @johnmyleswhite got feedback from a serious Torch user that it would be useful, but I don't really know what for.

@carnaval
Copy link
Contributor

carnaval commented Aug 4, 2015

I don't think having a write barrier builtin from julia is too bad. I'm actually somewhat surprised we never needed it. However this hypothetical buffer object will definitely need a flag to indicate whether it's full of pointers or bits for the gc, ruining a bit the illusion of dumb bytes, but well, there is no way around that.

It would be interesting to see if we can provide the feature of inline small buffer allocation in a first class (and maybe more general) way. It's kinda contrary to the local philosophy to have builtin Arrays be variable length but no one else can.

I would be fine if resizing was limited to another type than Array (ArrayList ?) but I feel this opinion is not so popular. That way paying the pointer indirection cost is a conscious decision (that I would venture you don't need to do very often). Maybe we could even try to support nd-resizing that way, to add lines and columns.

@mbauman
Copy link
Member

mbauman commented Aug 4, 2015

I've definitely thought about this before, too, and would love to see Array defined in Julia. But I think that it will definitely require much more flexible Julia type definitions to get competitive performance. Here's where I've run into these sorts of performance issues in the past:

  • Two allocations for small objects. This isn't terribly unlike my attempts to make the BitArray struct immutable with a RefValue{Int} field to allow for mutable length. Allocating that extra tiny object was a 3x penalty for small BitArrays (<64 bits). WIP: Allow constant fields in mutable types #11430 (comment)
  • Performance from data locality. This is related to the above, but just as important. The ability of the builtin Array to put its data immediately following the header is extremely cache friendly, and I think it makes a big difference. Check out this StackOverflow answer where I created an unsafe Array header that pointed to the data as a super-lightweight column view. The good old allocating getindex still beat it after the cache warmed up!
  • Mutable length vectors, constant size arrays. This can be worked around, but it's ugly: WIP: Allow constant fields in mutable types #11430 (comment). A somewhat-related benchmark here is back when I foisted the manual pointer hoist for BitArrays when I was trying to prevent a GC frame. That caused a 20% performance regression in some cases: Speed up scalar BitArray indexing by ~25% #11403 (comment). But as @simonster mentions, there are also other approaches here, and perhaps TBAA becomes easier with a separate storage type.

@argriffing
Copy link

  1. A memory buffer type, which is just a linear set of bytes
  2. An array, which holds a reference to its backing memory buffer and stores shape and stride information.

A few advantages:
[...]
3) It's easier to reason about when different array instances are aliasing memory - if they have the same backing buffer, they're aliased.

What if someone puts their whole system in a tensor and only ever operates on views of its backing memory buffer? Under the reasoning above, wouldn't all of these array instances be considered to be aliasing memory with each other?

@johnmyleswhite
Copy link
Member

That seems to mostly depend on whether the views are disjoint.

@argriffing
Copy link

That seems to mostly depend on whether the views are disjoint.

Right, 3) It's easier to reason about when different array instances are aliasing memory - if they have the same backing buffer, they're aliased. doesn't sound like it depends on whether the views are disjoint if they share the same backing memory buffer. It can be nontrivial to check for the disjointness because of shapes and strides stuff, and there's an active numpy issue related to this.

@johnmyleswhite
Copy link
Member

That's all true. But how is that different from the status quo?

@mbauman
Copy link
Member

mbauman commented Aug 4, 2015

I think the point is that we'll use different types that represent aliased (or possibly aliased) data. Currently reshape shares data between two Array objects, which is definitely something I want to move away from.

@argriffing
Copy link

how is that different from the status quo?

Honestly I don't know the status quo in Julia, I was just adding 2 cents because I saw something wrong (or at least questionable) on the internet :)

@quinnj
Copy link
Member

quinnj commented Aug 12, 2015

@mbauman, maybe we need reshape and reshape!?

@KristofferC
Copy link
Member

@quinnj that makes a lot of sense to me

@mbauman
Copy link
Member

mbauman commented Aug 12, 2015

See #10507. I think reshape should always return a view (like indexing will) and ideally it'll be performant enough to not need reshape!. (Edit to add: allowing an operation like reshape! means that the dimensions of an array are not constant… which may have bigger performance effects everywhere.)

@quinnj
Copy link
Member

quinnj commented Aug 12, 2015

Ultimately though, will the proposal there return a ReshapeArray instead of a regular Array? I feel like a general trend has been figuring out how to return the same types with all these operations, utilizing shared data (i.e. getindex returning a String or Array instead of SubString/SubArray). I've probably missed a key discussion here or there on this though, so feel free to correct me on this.

@simonster simonster changed the title Introduce Buffer type and make Array an abstraction on top of it Introduce Buffer type and make Array an abstraction on top of it Aug 12, 2015
@timholy
Copy link
Member

timholy commented Aug 12, 2015

There's no way to return an Array from certain reshape operations unless you copy data. So we will need a special type.

Performance is indeed my big worry with the ReshapedArrays. It seems to be a much harder problem than I expected.

@quinnj
Copy link
Member

quinnj commented Aug 12, 2015

@timholy, but if Array is defined in Julia, could we have more control over always returning an Array? What are the "certain operations" where we need to copy data?

@mbauman
Copy link
Member

mbauman commented Aug 12, 2015

It'd be very difficult to shoehorn all of SubArray and ReshapedArray (performantly!) into Array without slowing down the normal, non-view case. You probably could make it work with a bunch of type parameters, but then it may as well be a different type.

By exposing the Buffer type, we could hoist the Array's buffer directly into a field of the view objects, which should help with their performance (if LLVM optimizations don't beat us to the punch). It could also mitigate some of the MAX_TYPE_DEPTH and over-specialization issues that we'll probably start running into with lots of deeply nested type parameter lists flying around.

The certain operations are some combinations of reshapes of sub-views, if I remember correctly.

@quinnj
Copy link
Member

quinnj commented Aug 12, 2015

I'll certainly defer to our resident array ninjas on the subtleties of reshape and sub operations. I'm going to make a probably poor attempt at least for the Buffer type soon; I figure we can start as small as possible and build from there.

@argriffing
Copy link

What are the "certain operations" where we need to copy data?

Here's an example from numpy: numpy/numpy#6166 (comment), where I think the generalized transpose is like http://docs.julialang.org/en/latest/stdlib/arrays/?#Base.permutedims in Julia.

@simonster
Copy link
Member

Indexing of SubArrays already has zero cost over indexing of ordinary Arrays, so I think we could shoehorn it all into Array if we wanted to, but there may be reasons not to. LLVM can also already hoist the load of the array pointer from SubArray out of loops in most/all cases where it matters because they are immutable, so I don't expect improved performance for that case. Rather, I expect that trying to implement Arrays in Julia will force us to sort out issues that currently affect other AbstractArray types, e.g. the stupid GC root-related performance issues and unhoisted array pointer loads for BitArrays.

@timholy
Copy link
Member

timholy commented Aug 12, 2015

The biggest issue is that the there is no (nice) view type that is closed under both reshaping and taking a cartesian subset. The only type that is closed is one that stores one memory-offset per element, and that would be horrible. See #9874 (comment).

@JeffBezanson
Copy link
Member

Tentatively marking as 0.6 to encourage experimentation in that timeframe.

@JeffBezanson JeffBezanson added this to the 0.6.0 milestone Aug 9, 2016
@vtjnash
Copy link
Member

vtjnash commented Aug 10, 2016

Agreed, that should definitely be the plan. I think this needed the stack allocation and an inline immutable PRs to be merged first to make this a perform.

@DemiMarie
Copy link

I think the current cache behavior (excellent) needs to be kept, which means that the Array data must still immediately follow the header.

@andyferris
Copy link
Member

andyferris commented Oct 28, 2016

I was looking for progress on this since @StefanKarpinski announced there is a feature freeze quite soon!

In principle this doesn't seem to hard for isbits element types... I took an attempt at it here using a native-Julia Buffer and performance didn't seem too bad at first glance (same as Array, but I'm sure that's not true for all functions).

Unfortunately, I get crashes sometimes where my finalizers call free... (bug fixed)

@andyferris
Copy link
Member

I forgot to mention my approach only works nicely for element types which are isbits. Amongst other things, some control over GC would be needed for non-isbits types. One day I'd like to learn more about Julia's GC.

@timholy
Copy link
Member

timholy commented Oct 29, 2016

Neat experiment. But finalizers are yucky, would be much better if this used julia's own GC to manage the buffer.

@andyferris
Copy link
Member

But finalizers are yucky

OK I admit I didn't know much about how they are implemented. I've been reading around - is the main problem here that they are slow? Would type-based finalization fix the performance problem? I can kind-of see that deferred finalization of many resources is not desirable, but our GC already does that for memory, which is what this finalizer is all about...

would be much better if this used julia's own GC to manage the buffer.

This is no doubt true, and makes more sense. On the other hand, having the ability to boss around the GC from julia code could also be pretty awesome (or potentially nightmarish). However, asking it for a buffer (and telling it whether it is full of data or full of pointers it should scan) is probably precisely the only sane thing you can boss it around to do from Julia code...

I really should read the code for the GC one day, but here's a question. If the buffer contains arrays where some fields are data and some fields are references to other Julia objects, how does the GC know which parts of the buffer to scan? (Is this only a problem in the future when non-isbits immutables get inlined?)

@yuyichao
Copy link
Contributor

is the main problem here that they are slow? Would type-based finalization fix the performance problem?

No and no.

I can kind-of see that deferred finalization of many resources is not desirable, but our GC already does that for memory, which is what this finalizer is all about...

No the GC keep track of the memory usage, but not resources behind finalizer.

On the other hand, having the ability to boss around the GC from julia code could also be pretty awesome (or potentially nightmarish).

I'd prefer not giving people access to it unless a really convincing use case is made.

However, asking it for a buffer (and telling it whether it is full of data or full of pointers it should scan) is probably precisely the only sane thing you can boss it around to do from Julia code...

Yes.

If the buffer contains arrays where some fields are data and some fields are references to other Julia objects, how does the GC know which parts of the buffer to scan?

The buffer is always well typed.

(Is this only a problem in the future when non-isbits immutables get inlined?)

Which is a requirement for this change.

@andyferris
Copy link
Member

The buffer is always well typed.

Ahh! Thanks, I was just beginning to think this is the only way it would make sense. So a future Buffer{T} would be a lot like (a non-resizeable) Vector{T} but with almost no methods defined on it (just length, eltype, load/getindex, and store!/setindex!, say)? I guess being accessible to the GC means it is more restrictive than a buffer in C++ would be (where you can store stuff in whatever memory pattern you like and cast the data to the appropriate type later - whereas in Julia a "generic" buffer (e.g. Buffer{UInt8}) could only be useful for isbits types).

(Is this only a problem in the future when non-isbits immutables get inlined?)

Which is a requirement for this change.

The circular nature didn't escape me :) It's almost cute. Though, strictly, if it is true that Vector currently uses a double indirection, then I suppose that a Buffer backed Array could be viable already, no?

@yuyichao
Copy link
Contributor

Though, strictly, if it is true that Vector currently uses a double indirection, then I suppose that a Buffer backed Array could be viable already, no?

No.

@vtjnash
Copy link
Member

vtjnash commented Dec 22, 2016

This'll happen in the 1.0 release. It should perhaps have had more attention in 0.6 so we could have it on master longer, but hard stop is really just the 1.0 release.

@Seelengrab
Copy link
Contributor

This would be useful for buffered Channels, to remove the popfirst! ccall & resizing overhead by implementing a ringbuffer on top of this. With the changes to compiler & GC over the past few years, has the situation changed in regards to this being viable/an option before 2.0? Have new possibilities for implementing this cropped up?

@brenhinkeller brenhinkeller added the arrays [a, r, r, a, y, s] label Nov 21, 2022
@nsajko
Copy link
Contributor

nsajko commented Apr 20, 2024

I think this is fixed now on the v1.11 prereleases due to #51319.

@nsajko nsajko closed this as completed Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s]
Projects
None yet
Development

No branches or pull requests