Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support isbits Union array elements and struct fields #22441

Merged
merged 5 commits into from
Aug 18, 2017

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Jun 19, 2017

Note this is based on the jn/union-bits-layout branch.

This is an initial attempt to fix some of the issues reported here.

This successfully compiles, but I'm still feeling my way around /src code and figuring out if my code paths are even getting hit. The idea here is that when we allocate a Union{T, S} array where T and S are both isbits, we compute the "element size" for the array using jl_union_isbits, allocate the buffer for that, then add on an extra "selector" byte for each element after the data buffer. The "selector" bytes indicate which Union element is actually stored in a data buffer element.

If anyone can provide more pointers to what else needs to be updated, or simple ways to ensure I'm actually hitting codepaths, I'd appreciate it. I feel like a bit of a bull in a china shop here 😄 .

Closes #17073

@quinnj quinnj force-pushed the jq/array-union-bits branch 2 times, most recently from 5bd23b1 to 8f010bb Compare June 20, 2017 14:19
@timholy
Copy link
Member

timholy commented Jun 21, 2017

Could use a review from @vtjnash.

@timholy timholy requested a review from vtjnash June 21, 2017 12:13
@quinnj
Copy link
Member Author

quinnj commented Jun 21, 2017

Unfortunately for him, I commandeered his day yesterday to help me w/ the latest commit I pushed 😄

@quinnj quinnj force-pushed the jq/array-union-bits branch 2 times, most recently from dbc36f1 to 97d5d44 Compare June 24, 2017 14:52
@quinnj
Copy link
Member Author

quinnj commented Jun 24, 2017

Ok, an update here:

  • Most of the mutating operations on Vectors should be good to go; that code is a bit hairy, so I tried to hit all the code paths in the new core.jl tests
  • I'm seeing a segfault when doing v = Vector{Union{Int8, Void}}(10); v[1] = 10, I believe it's due to trying to setindex! w/ a Int64 instead of Int8, and that the changes in codegen aren't doing something right here? @vtjnash do we need to insert a convert(T, x)::T like the definition in Base has?
  • Still need to update the serialize/deserialize code to be aware of Union isbits arrays
  • Still want to add tests for @vtjnash's Union isbits struct field code
  • I plan on adding more comments/docs around what we're doing here; it's not too crazy, but I am slightly worried about people who may start using Arrays in non-standard ways and the surprises they might run into; that of course assumes they're actually using Union isbits arrays and doing something not very standard (calling array.c functions for example)
  • Benchmarks: not sure if we need to add any to BaseBenchmarks, but I want to do some comparisons vs. DataArrays and NullableArrays just to know what that story looks like
  • Discovered that Array{Union{}, 2}(0) throws a weird error, I'll open a separate issue about that
  • I also meant to ask @vtjnash or @JeffBezanson how Unions' types get sorted, i.e. Union{Int, Void} vs. Union{Void, Int}. Currently it always get sorted as Union{Int, Void}, but I'm wondering if that could be changed (easily?) to sort datatype singleton's first? That would have the nice properties of 1) getting an array of nulls by default in the case of Vector{Union{T, Null}}, no filling required and 2) match the bitmask implementations of pandas & apache arrow of using 0 = null in their bitmasks that accompany their column data.

Otherwise, I do think this code is at a place to be reviewed by @vtjnash & @yuyichao at least in case there's something egregious going on.

src/datatype.c Outdated

JL_DLLEXPORT int jl_islayout_inline(jl_value_t *eltype, size_t *fsz, size_t *al)
{
unsigned countbits = union_isbits(eltype, fsz, al);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent's off here

src/datatype.c Outdated
}
if (jl_layout_isbits(ty)) {
size_t sz = jl_datatype_size(ty);
size_t al = ((jl_datatype_t*)ty)->layout->alignment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use jl_datatype_align for this

src/datatype.c Outdated
size_t al = ((jl_datatype_t*)ty)->layout->alignment;
if (*nbytes < sz)
*nbytes = sz;
if (*align < al)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC in a bitsunion array we can store elements that are eitherA or B?

I would think that the alignment needs to be a multiple of the alignment of A and B.
As an example, if A has alignment 8 and B as alignment 12 the alignment would need to be 24 (Joint alignment needs to be divisible by the individual alignments.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it has been pointed out to me alignment 12 shouldn't exist (and me encountering that is probably a bug). Nevertheless, we should make sure that the new alignment is compatible with both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, alignments must be powers of N (usually 2), otherwise N * alignment wouldn't be aligned to the alignment. Hence picking the larger alignment is always sufficient.

@KristofferC
Copy link
Member

Added a "Closes #17073" to the top comment.

@quinnj quinnj force-pushed the jq/array-union-bits branch 2 times, most recently from e677731 to af19042 Compare July 6, 2017 05:28
@quinnj
Copy link
Member Author

quinnj commented Jul 7, 2017

Alrighty, another round of updates here. The new commits address the comments that were posted, general code cleanup, code comments and documentation, and getting tests passing (all Base tests pass for me locally). Apart from ensuring CI is happy, the remaining known issues that I'm aware of are two bugs that I suspect may be related to still-needed inference changes?

Case 1:

v = Vector{Union{Int8, Void}}(10)
v[1] = 10

this currently segfaults on an assert on the TIndex. I don't quite understand how we're getting past the call to convert in the Base.setindex! method

setindex!(A::Array{T}, x, i1::Int) where {T} = arrayset(A, convert(T,x)::T, i1)

Julia 0.6 appropriately fails with the error of "can't convert argument to Union{Int8, Void}".
My efforts to debug this haven't been very fruitful, as it seems once we're in codegen, we're already past the point where the convert should have happened? I'd really appreciate any insights here or help in understanding the interaction w/ codegen.

Case 2:
Found this while writing tests for @vtjnash's work on isbits Union struct field types.

julia> mutable struct UnionField
           x::Union{Int8, Void}
       end

julia> x = UnionField(nothing)
UnionField(nothing)

julia> x.x = Int8(1)
1

julia> x
UnionField(0)

Again, my efforts of trying to step through codegen were pretty futile and I had a hard time figuring out why exactly things weren't getting set correctly. I'll probably try to dig into this error again, but any tips again would be most appreciated.

@quinnj quinnj force-pushed the jq/array-union-bits branch 2 times, most recently from 85223f9 to 6f52a44 Compare July 11, 2017 13:58
Value *ptindex = ctx.builder.CreateGEP(T_int8, emit_bitcast(ctx, addr, T_pint8), ConstantInt::get(T_size, fsz - 1));
ctx.builder.CreateStore(tindex, ptindex);
// copy data
if (!rhs.isghost) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the most idiomatic way to check for a singleton type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK. We should probably try to clean up codegen, since ghost actually probably refers to being an immutable singleton, and is probably inconsistent as to whether it refers to mutable singletons also.

src/codegen.cpp Outdated
if (jl_is_uniontype(ety)) {
Value *nbytes = ConstantInt::get(T_size, elsz);
Value *data = emit_arrayptr(ctx, ary, args[1]);
if (!(rhs.typ == jl_bottom_type)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the right kind of check here; the issue is when you have something like

v = Vector{Union{Int8, Void}}(10);
v[1] = Int64(10)

this should result in a convert error, but it passes thru codegen with rhs being jl_bottom_type, which ends up throwing an error in compute_tindex_unboxed (since Bottom type doesn't match one of the Union types).

@quinnj
Copy link
Member Author

quinnj commented Jul 11, 2017

Alright, this now passes all tests locally and I've fixed all known bugs that I'm aware of. Let's see what CI says.

@quinnj
Copy link
Member Author

quinnj commented Jul 12, 2017

Progress! 2 green checks on travis (linux & osx 64-bit), and just pushed a fix for 32-bit. I'm not sure what happened on appveyor 64-bit, has anyone seen that error before?

@tkelman
Copy link
Contributor

tkelman commented Jul 12, 2017

gc issue, missing root maybe?

From worker 9:	WARNING: Method definition Allocations: 152865371 (Pool: 152848789; Big: 16582); GC: 390
	From worker 9:	<?#0000000008B10900::0000000000000000>
	From worker 9:	0000000027A25040: Queued root: 0000000005170C70 :: 0000000008B10DF0 (bits: 3)
	From worker 9:	        of type Array{Any, 1}

... snip

From worker 9:	This application has requested the Runtime to terminate it in an unusual way.
	From worker 9:	Please contact the application's support team for more information.
	From worker 9:	
	From worker 9:	signal (22): SIGABRT
	From worker 9:	while loading C:\projects\julia\julia-2d15e1e9d0\share\julia\test\core.jl, in expression starting on line 5130
	From worker 9:	crt_sig_handler at C:/projects/julia/src\signals-win.c:92
	From worker 9:	raise at C:\windows\system32\msvcrt.dll (unknown line)
	From worker 9:	abort at C:\windows\system32\msvcrt.dll (unknown line)
	From worker 9:	gc_assert_datatype_fail at C:/projects/julia/src\gc.c:1371
	From worker 9:	gc_mark_loop at C:/projects/julia/src\gc.c:2176
	From worker 9:	_jl_gc_collect at C:/projects/julia/src\gc.c:2457
	From worker 9:	jl_gc_collect at C:/projects/julia/src\gc.c:2622
	From worker 9:	jl_gc_pool_alloc at C:/projects/julia/src\gc.c:954 [inlined]
	From worker 9:	jl_gc_alloc_ at C:/projects/julia/src\julia_internal.h:265 [inlined]
	From worker 9:	jl_gc_alloc at C:/projects/julia/src\gc.c:2657
	From worker 9:	jl_new_struct at C:/projects/julia/src\datatype.c:711
	From worker 9:	var_gt at C:/projects/julia/src\subtype.c:498
	From worker 9:	forall_exists_equal at C:/projects/julia/src\subtype.c:938
	From worker 9:	subtype at C:/projects/julia/src\subtype.c:917

@quinnj quinnj changed the title WIP: Work to support isbits Union Array element types Support isbits Union array elements and struct fields Jul 12, 2017
@quinnj
Copy link
Member Author

quinnj commented Jul 12, 2017

Removed the "WIP" from the PR title and updated to reflect struct field type support. There have been intermittent travis/appveyor failures that are fairly cryptic, some seem unrelated to changes here while others do seem related. I'll keep trying to track down failures, but at this point, those w/ more knowledge might see obvious issues by just reviewing the code.

@quinnj quinnj mentioned this pull request Jul 13, 2017
@quinnj quinnj force-pushed the jq/array-union-bits branch from 9b3a4ed to fcc891e Compare July 14, 2017 04:36
@quinnj
Copy link
Member Author

quinnj commented Jul 15, 2017

Uh....what's w/ the windows failure? Did I change to a unsupported printf format code for windows or something?

@vtjnash
Copy link
Member

vtjnash commented Jul 15, 2017

zu bytes, eh? Yeah, that's not supported. Just cast to an int (or use an int for the alignment computation).

src/array.c Outdated
@@ -586,6 +616,10 @@ static int NOINLINE array_resize_buffer(jl_array_t *a, size_t newlen)
nbytes++;
oldnbytes++;
}
if (!a->flags.ptrarray && jl_is_uniontype(jl_tparam0(jl_typeof(a)))) {
nbytes += oldlen;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+= newlen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

vtjnash and others added 5 commits August 15, 2017 13:33
unlike codegen, only bitstypes (!isptr) fields are permitted in the union
and the offset count starts from 0 instead of 1
but otherwise the tindex counter is compatible
@quinnj quinnj force-pushed the jq/array-union-bits branch from f9766a6 to 88341b9 Compare August 15, 2017 19:34
Value *selidx = ctx.builder.CreateMul(emit_arraylen_prim(ctx, ary), nbytes);
selidx = ctx.builder.CreateAdd(selidx, idx);
Value *ptindex = ctx.builder.CreateGEP(T_int8, data, selidx);
Value *tindex = ctx.builder.CreateNUWAdd(ConstantInt::get(T_int8, 1), tbaa_decorate(tbaa_arrayselbyte, ctx.builder.CreateLoad(T_int8, ptindex)));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash, added the tbaa_arrayselbyte tbaa.

@quinnj
Copy link
Member Author

quinnj commented Aug 15, 2017

All that green feels super good.

@vtjnash
Copy link
Member

vtjnash commented Aug 15, 2017

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@quinnj
Copy link
Member Author

quinnj commented Aug 16, 2017

None of the benchmark results look like anything to worry about; after checking things out locally, it seems like these are just some of the noisiest benchmarks we have.

@quinnj
Copy link
Member Author

quinnj commented Aug 16, 2017

I'm actually not surprised w/ the benchmarks, since I doubt many of them involve Unions at all. What we really need are some isbits-Union benchmarks; I have some lying around, but I'll try to clean them up and post some local results and plan on adding some to BaseBenchmarks.

@KristofferC
Copy link
Member

Tangentially, It would be cool to have some non-micro benchmarks where unions are a natural way of solving the problem, and make the code significantly simpler.

@vtjnash
Copy link
Member

vtjnash commented Aug 16, 2017

I think the only one I was concerned with was the allocations. Any idea what those came from?

@quinnj
Copy link
Member Author

quinnj commented Aug 17, 2017

Hmmm.....I've spent a few hours now digging thru performance/benchmarking and nothing seems really out of the ordinary. The problem benchmarks are pretty tricky to disect since they involve so much random code. I've made a PR to add some basic is bits Unions benchmarks for NanoSoldier here.

Anybody else have any ideas on benchmarking or concerns with merging?

@quinnj
Copy link
Member Author

quinnj commented Aug 18, 2017

I'll give this till the morning (12 hours or so) for any last reviewers

@quinnj quinnj merged commit d126c66 into master Aug 18, 2017
@StefanKarpinski StefanKarpinski deleted the jq/array-union-bits branch August 18, 2017 16:04
@StefanKarpinski
Copy link
Member

t3relrb

@tkelman
Copy link
Contributor

tkelman commented Aug 18, 2017

should have been squashed if the intermediate commits don't all work

@quinnj
Copy link
Member Author

quinnj commented Aug 18, 2017

"all work"; does that mean each commit should pass all tests? just compile? I tried to keep each of the commits pretty logically separated between union field support, array support, then tests, etc.

@tkelman
Copy link
Contributor

tkelman commented Aug 18, 2017

pass tests ideally, depends what future bisectors are looking for.

@quinnj
Copy link
Member Author

quinnj commented Aug 18, 2017

ok, cool. FWIW, I believe tests should have passed for all commits, except maybe after jameson's first, but I also didn't want to clobber his commit w/ the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Alternative{T,U}
9 participants