Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove dependence on Julia internal Core.Compiler.return_type #127

Closed
wants to merge 5 commits into from

Conversation

palday
Copy link

@palday palday commented Dec 28, 2023

see also JuliaLang/julia#52385

There's also another problem that arises on Julia 1.10 when Indices are constructed from an empty tuple, the resultant Indices{Union{}} create all sorts of problems. To get around this, I've added a check for empty tuples and changes the indices eltype to be Any. See also MakieOrg/AlgebraOfGraphics.jl#472

@andyferris
Copy link
Owner

OK, thanks.

Just trying to get my head around everything. Is using code_typed ultimately less likely to invoke undefined behavior than return_type? Is there any difference in obsrvable behavior? Does this make something work that doesn't work without it? (The Julia 1.10 issues seem more related to changes to handling of Union{} than to changes in the type inference algorithm).

Currently, this method depends on `code_typed`.
"""
function return_type(f, types)
return last(only(code_typed(f, types;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worse than just calling Core.Compiler.return_type. It still trespasses on the internals of the Julia implementation, as it's not documented that last may be used here, and I guess it's slower than Core.Compiler.return_type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsajko It's calling last on a Pair -- how is that undocumented behavior? It's also only calling public API (an function that appears in the manual). I guess I could also access the rettype property of the CodeInfo that's the first element of that pair, but I still need to access the pair.

It will almost definitely be slower than calling Core.Compiler.return_type, because I'm guessing that Core.Compiler.return_type is called somewhere in the implementation of code_typed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, sorry.

@palday
Copy link
Author

palday commented Dec 28, 2023

Just trying to get my head around everything. Is using code_typed ultimately less likely to invoke undefined behavior than return_type?

Probably not, but return_type is considered an internal function and there seems to be reluctance to treat the problems with Union{} as real as long as you're accessing an internal function. So there is a real problem that this package demonstrates (well, via errors that occur in dependents' use of this package) but that problem is considered invalid by some core Julia devs because you're using internals.

That said, using internals is also risky because their behavior can change or be dropped at any point.

Is there any difference in obsrvable behavior? Does this make something work that doesn't work without it?

No(ish) and yes. There are two changes here:

  • providing a local implementation of return_type that does not depend on compiler internals (no change in observable behavior)
  • slight modification in what the index constructed from any empty tuple looks like (strictly speaking observable, if anybody looks at the type parameters of said Index, but realistically unobservable since it corresponds to the index of an empty collection). This dodges the problems with Union{} introduced in Julia 1.10.

The latter change in particular fixes a major downstream Julia 1.10 compatibility issue: MakieOrg/AlgebraOfGraphics.jl#472

(The Julia 1.10 issues seem more related to changes to handling of Union{} than to changes in the type inference algorithm).

I agree, but the IMHO legitimate problem in the new handling of Union{} is being dismissed because you're touching compiler internals.

@nsajko
Copy link

nsajko commented Dec 28, 2023

The latter change in particular fixes a major downstream Julia 1.10 compatibility issue

Shouldn't this rather be fixed in AoG? Assuming it doesn't get fixed in Julia.

@palday
Copy link
Author

palday commented Dec 28, 2023

Shouldn't this rather be fixed in AoG? Assuming it doesn't get fixed in Julia.

AoG is calling things, but the actual error occurs in Dictionaries.jl code.

@aplavin
Copy link
Contributor

aplavin commented Jan 1, 2024

Creating Indices{Union{}} is a perfectly fine thing to do, and it's not prevented by this PR (neither should it).

julia> Indices(Union{}[])
0-element Indices{Union{}}

This PR adds an explicit check to one special case, where Indices are constructed from a Tuple. Is that worth it? I'm hopeful for the Julia team to acknowledge and likely fix the breaking change in some 1.10 version...

The change of return_type -> code_typed is orthogonal to that Tuple{} thing, and doesn't really reduce any dependency on "internals". Inference results are not guaranteed to stay stable, and this influences both return_type and code_typed. Note that (according to JuliaHub search) lots of packages use return_type, but (almost?) none use code_typed for this purpose.

@palday
Copy link
Author

palday commented Jan 2, 2024

The change of return_type -> code_typed is orthogonal to that Tuple{} thing, and doesn't really reduce any dependency on "internals".

This is patently incorrect -- Core.Compiler.return_type is not documented in the manual and is therefore per definition internal.

I do agree that the Union{} / Tuple{} issue is orthogonal, but I suggested this change because the real issue was being dismissed prima facie because of the use of internals. So showing that the issue still arose when there was no use of internals was useful for that.

In terms of the workaround here for Union{}/Tuple{} -- I'm not strongly tied to it, but it seemed like a reasonable, straightforward change. If you're constructing an Index from an empty tuple, you do not know what the original element type is, so Any IMHO makes as much sense as Union{}. On the other hand if you're constructing from e.g. an empty Vector, then you still have the eltype information and so can construct an appropriate empty index. In terms of the type definition, tree, etc., it's fine to explicitly construct Index{Union{}}(), but that is an explicitly an empty index (because of both the type parameter and the lack of initialization values), which I view as a slightly different case than Index(iter) where iter just happens to be Tuple{}.

@aplavin
Copy link
Contributor

aplavin commented Jan 2, 2024

This is patently incorrect

Sorry, maybe I formulated it too strongly :) It very slightly reduces the internals dependence, because the actual return value of code_typed is still not guaranteed to be stable and can change when inference changes in Julia. Relying on inference, no matter how exactly it is done, is relying on Julia internals.

This change from return_type to code_typed increases overhead a lot:

# Core return_type()
julia> @btime Core.Compiler.return_type(sin, Tuple{Float64})
  0.791 ns (0 allocations: 0 bytes)
Float64

# return_type() from this PR
julia> @btime return_type(sin, Tuple{Float64})
  48.333 μs (1232 allocations: 69.22 KiB)
Float64

Wouldn't be great to pay this overhead on every Dictionary/Indices construction, especially given that it doesn't remove the internals dependence.
And indeed much more packages use return_type to probe inference than code_typed.

I suggested this change because the real issue was being dismissed prima facie because of the use of internals

The Tuple{Union{}} error in 1.10 is completely independent on internals, and is a breaking change anyway. It just happened that the first manifestations of this issue were found in code that relies on inference.
Luckily, now there's no shortage of internals-free examples that demonstrate this issue (in JuliaLang/julia#52385).

# is used. For an empty iterator, we actually don't know what the
# True(tm) eltype is, so the top of the type hierarchy (Any) is
# just as reasonable as the bottom (Union{})
I = typeof(iter) == Tuple{} ? Any : eltype(iter)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're constructing an Index from an empty tuple, you do not know what the original element type is, so Any IMHO makes as much sense as Union{}.

Small point: Any is technically correct (in a trivial sense), but less precise than Union{}. The empty union is the best possible (most accurate) type for the empty set of values. The possible breakage on the Julia side, however, is another thing...

Copy link

@nsajko nsajko Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small note: typeof(iter) == Tuple{} isn't very pretty/Julian. Here are two alternatives that are prettier in my opinion (although all of them seem equally performant):

f(it) = (typeof(it) == Tuple{}) ? Any : eltype(it)
g(it) = (it == ()) ? Any : eltype(it)
h(it) = (it isa Tuple{}) ? Any : eltype(it)

I vote for it == () (in case this change turns out to be necessary), seems like the nicest option.

EDIT: I think there's actually a style guide or something in Julia's manual that advises not to compare types with ==. In that light another option is typeof(it) <: Tuple{}, although the typeof is still unnecessary with that form.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it === () or it isa Tuple{} are the correct forms. (Note: the latter form might invoke a slower subtyping algorithm when the LHS isn't inferred to a concrete type and/or the RHS is an abstract type; the former is always quite straightforward).

@andyferris
Copy link
Owner

andyferris commented Jan 3, 2024

Okay, this has really opened a can of worms :)

There are few things going on here.

  1. TypedTables is pretty dumb when you have zero columns - we should improve that logic in this package.
  2. We frequently use type inference to figure out the type parameters of a typed table, to make sure the return type is concrete (and downstream operations aren't penalized).
  3. Julia has both invariant and covariant type parameters. Handling the eltype when mapping an empty array using only automated type inference, for example, becomes quite tricky.
  4. Julia 1.10 throws an error on Tuple{Union{}}, where previous versions haven't.

And yes, this package relies on Julia internals. It always has - originally this was "a feature not a bug" because it gave great performance + ergonomics. Traditionally we update the package to work well with each new version of Julia (but this hasn't been necessary in recent years).

The plan was always to use something "recommended" once Julia became more mature. I actually have an minimal example for the logic we follow:

x = map(sin, Int64[])

What is the type (concrete, runtime type) of x? Is it inferred? How does Base determine the eltype of the returned array? For many years, it has been known that invoking type inference to figure out a type which then needs to get inferred might expose undefined behavior (I'm even not sure what well-defined nested inference looks like inside recursive loops...). However, when not run as a part of a recursive loop it does in practice seem to work well. In fact, for the example above Base still uses Core.Compiler.return_type to figure out the eltype must be Float64. I think core devs like Jameson are right to raise concern with this pattern, but I still believe Base needs to lead the way here before we can copy a working pattern.

In general, I feel like swapping Core.Compiler.return_type to code_typed ultimately targets a political problem, not a technical problem. In fact it may introduce some regressions here. The discussion at JuliaLang/julia#52385 is progressing - but I think we need to get the attention of a few more devs that can do something in the space so they can design a solution amongst themselves.

We do however need to get this package working well with Julia 1.10, and any improvements to handling tables with zero rows or with zero columns are extremely valuable and welcome. Is there something more minimal we could try that will at least get us unstuck in the short term?

Copy link

codecov bot commented Jan 3, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (7b5b7b3) 79.35% compared to head (c83f300) 79.15%.

Files Patch % Lines
src/Dictionary.jl 0.00% 2 Missing ⚠️
src/map.jl 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #127      +/-   ##
==========================================
- Coverage   79.35%   79.15%   -0.20%     
==========================================
  Files          21       21              
  Lines        2349     2351       +2     
==========================================
- Hits         1864     1861       -3     
- Misses        485      490       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aplavin
Copy link
Contributor

aplavin commented Jan 3, 2024

I actually have an minimal example for the logic we follow:
x = map(sin, Int64[])

And Base itself doesn't hesitate to return a collection with Union{} eltype when it makes sense :)

julia> map(length, Symbol[])
Union{}[]

Also when making an array out of an empty tuple - the exact same scenario as discussed here with Indices:

julia> collect(())
Union{}[]

So the consistent way, following Base, is to continue returning Indices{Union{}}. And as @nsajko points out that this is the most accurate type to return. Would be weird to change it and make the result both less consistent and less accurate.

Would be nice if those affected by this 1.10 issue chimed in the JuliaLang/julia#52385 discussion! I think popularity does affect the priority different bugs get (and that's totally sensible).

@nsajko
Copy link

nsajko commented Jan 3, 2024

I think popularity does affect the priority different bugs get (and that's totally sensible).

Note that the issue is labeled with triage, so I guess a decision might happen either way in less than 24 hours.

@palday
Copy link
Author

palday commented Jan 4, 2024

It looks like the Tuple{Union{}} change will be reverted: JuliaLang/julia#52385 (comment)

@palday palday closed this Jan 4, 2024
@andyferris
Copy link
Owner

Thanks for following up on this @palday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants