Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating dev to 2590e675 #77

Merged
merged 865 commits into from
Dec 6, 2024
Merged

Updating dev to 2590e675 #77

merged 865 commits into from
Dec 6, 2024

Conversation

udesou
Copy link

@udesou udesou commented Dec 4, 2024

Updating our dev branch to JuliaLang@2590e67

xili-h and others added 30 commits October 23, 2024 09:17
Co-authored-by: xili <xili@phas.ubc.ca>
…uliaLang#55983)

Documentation describes the correct way of extracting the element type
of a supertype:

https://docs.julialang.org/en/v1/manual/methods/#Extracting-the-type-parameter-from-a-super-type

However, one of the examples to showcase this is nonsensical since it is
a union of multiple element types.
I have replaced this example with a union over the dimension.
Now, the `eltype_wrong` function still gives a similar error, yet the
correct way returns the unambiguous answer.

---------

Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
…uliaLang#56300)

The pipeline-prints test currently fails when running on an
aarch64-macos device:

```
/Users/tim/Julia/src/julia/test/llvmpasses/pipeline-prints.ll:309:23: error: AFTERVECTORIZATION: expected string not found in input
; AFTERVECTORIZATION: vector.body
                      ^
<stdin>:2:40: note: scanning from here
; *** IR Dump Before AfterVectorizationMarkerPass on julia_f_199 ***
                                       ^
<stdin>:47:27: note: possible intended match here
; *** IR Dump Before AfterVectorizationMarkerPass on jfptr_f_200 ***
                          ^

Input file: <stdin>
Check file: /Users/tim/Julia/src/julia/test/llvmpasses/pipeline-prints.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             1: opt: WARNING: failed to create target machine for 'x86_64-unknown-linux-gnu': unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple.
             2: ; *** IR Dump Before AfterVectorizationMarkerPass on julia_f_199 ***
check:309'0                                            X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
             3: define i64 @julia_f_199(ptr addrspace(10) noundef nonnull align 16 dereferenceable(40) %0) #0 !dbg !4 {
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             4: top:
check:309'0     ~~~~~
             5:  %1 = call ptr @julia.get_pgcstack()
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             6:  %ptls_field = getelementptr inbounds ptr, ptr %1, i64 2
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             7:  %ptls_load45 = load ptr, ptr %ptls_field, align 8, !tbaa !8
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
             .
            42:
check:309'0     ~
            43: L41: ; preds = %L41.loopexit, %L17, %top
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            44:  %value_phi10 = phi i64 [ 0, %top ], [ %7, %L17 ], [ %.lcssa, %L41.loopexit ]
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            45:  ret i64 %value_phi10, !dbg !52
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            46: }
check:309'0     ~~
            47: ; *** IR Dump Before AfterVectorizationMarkerPass on jfptr_f_200 ***
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:309'1                               ?                                           possible intended match
            48: ; Function Attrs: noinline optnone
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            49: define nonnull ptr addrspace(10) @jfptr_f_200(ptr addrspace(10) %0, ptr noalias nocapture noundef readonly %1, i32 %2) #1 {
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            50: top:
check:309'0     ~~~~~
            51:  %3 = call ptr @julia.get_pgcstack()
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            52:  %4 = getelementptr inbounds ptr addrspace(10), ptr %1, i32 0
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
             .
>>>>>>

--

********************
Failed Tests (1):
  Julia :: pipeline-prints.ll
```

The problem is that these tests assume x86_64, which fails because the
target isn't available, so it presumably uses the native target which
has different vectorization characteristics:

```
❯ ./usr/tools/opt --load-pass-plugin=libjulia-codegen.dylib -passes='julia' --print-before=AfterVectorization -o /dev/null ../../test/llvmpasses/pipeline-prints.ll
./usr/tools/opt: WARNING: failed to create target machine for 'x86_64-unknown-linux-gnu': unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple.
```

There's other tests that assume this (e.g. the `fma` cpufeatures one),
but they don't fail, so I've left them as is.
```julia
julia> using LinearAlgebra

julia> A = rand(Int,4,4); x = rand(Int,4); y = similar(x);

julia> @time mul!(y, A, x, 2, 2);
  0.330489 seconds (792.22 k allocations: 41.519 MiB, 8.75% gc time, 99.99% compilation time) # master
  0.134212 seconds (339.89 k allocations: 17.103 MiB, 15.23% gc time, 99.98% compilation time) # This PR
```
Main changes:
- `generic_matvecmul!` and `_generic_matvecmul!` now accept `alpha` and
`beta` arguments instead of `MulAddMul(alpha, beta)`. The methods that
accept a `MulAddMul(alpha, beta)` are also retained for backward
compatibility, but these now forward `alpha` and `beta`, instead of the
other way around.
- Narrow the scope of the `@stable_muladdmul` applications. We now
construct the `MulAddMul(alpha, beta)` object only where it is needed in
a function call, and we annotate the call site with `@stable_muladdmul`.
This leads to smaller branches.
- Create a new internal function with methods for the `'N'`, `'T'` and
`'C'` cases, so that firstly, there's less code duplication, and
secondly, the `_generic_matvecmul!` method is now simple enough to
enable constant propagation. This eliminates the unnecessary branches,
and only the one that is taken is compiled.

Together, this reduces the TTFX substantially.
Before, typing `Base.is_interactive = 7` would cause weird internal REPL
failures down the line. Now, it throws an InexactError and has no
impact.
With this PR,
```julia
julia> first(Base.OneTo(10), 4)
Base.OneTo(4)
```
Previously, this would have used indexing to return a `UnitRange`. This
is probably the only way to slice a `Base.OneTo` and obtain a
`Base.OneTo` back.
This expands on the approach taken by
JuliaLang#54552.

We pass on more type information to `generic_matmatmul_wrapper!`, which
lets us convert the branches to method dispatches. This helps spread the
latency around, so that instead of compiling all the branches in the
first call, we now compile the branches only when they are actually
taken. While this reduces the latency in individual branches, there is
no reduction in latency if all the branches are reachable.

```julia
julia> A = rand(2,2);

julia> @time A * A;
  0.479805 seconds (809.66 k allocations: 40.764 MiB, 99.93% compilation time) # 1.12.0-DEV.806
  0.346739 seconds (633.17 k allocations: 31.320 MiB, 99.90% compilation time) # This PR

julia> @time A * A';
  0.030413 seconds (101.98 k allocations: 5.359 MiB, 98.54% compilation time) # v1.12.0-DEV.806
  0.148118 seconds (219.51 k allocations: 11.652 MiB, 99.72% compilation time) # This PR
```
The latency is spread between the two calls here.

In fresh sessions:
```julia
julia> A = rand(2,2);

julia> @time A * A';
  0.473630 seconds (825.65 k allocations: 41.554 MiB, 99.91% compilation time) # v1.12.0-DEV.806
  0.490305 seconds (774.87 k allocations: 38.824 MiB, 99.90% compilation time) # This PR
```
In this case, both the `syrk` and `gemm` branches are reachable, so
there is no reduction in latency.

Analogously, there is a reduction in latency in the second set of matrix
multiplications where we call `symm!/hemm!` or `_generic_matmatmul`:

```julia
julia> using LinearAlgebra

julia> A = rand(2,2);

julia> @time Symmetric(A) * A;
  0.711178 seconds (2.06 M allocations: 103.878 MiB, 2.20% gc time, 99.98% compilation time) # v1.12.0-DEV.806
  0.540669 seconds (904.12 k allocations: 43.576 MiB, 2.60% gc time, 97.36% compilation time) # This PR
```
This improves performance in the scaling `mul!` for `StridedArray`s by
using loops instead of broadcasting.
```julia
julia> using LinearAlgebra

julia> A = zeros(200,200); C = similar(A);

julia> @Btime mul!($C, $A, 1, 2, 2);
  19.180 μs (0 allocations: 0 bytes) # nightly v"1.12.0-DEV.1479"
  11.361 μs (0 allocations: 0 bytes) # This PR
```
The latency is reduced as well for the same reason.
```julia
julia> using LinearAlgebra

julia> A = zeros(2,2); C = similar(A);

julia> @time mul!(C, A, 1, 2, 2);
  0.203034 seconds (522.94 k allocations: 27.011 MiB, 14.95% gc time, 99.97% compilation time) # nightly
  0.034713 seconds (59.16 k allocations: 2.962 MiB, 99.91% compilation time) # This PR
```
Thirdly, I've replaced the `.*ₛ` calls by explicit branches. This fixes
the following:
```julia
julia> A = [zeros(2), zeros(2)]; C = similar(A);

julia> mul!(C, A, 1)
ERROR: MethodError: no method matching +(::Vector{Float64}, ::Bool)
```
After this,
```julia
julia> mul!(C, A, 1)
2-element Vector{Vector{Float64}}:
 [0.0, 0.0]
 [0.0, 0.0]
```
Also, I've added `@stable_muladdmul` annotations to the `generic_mul!`
call, but moved it within the loop to narrow its scope. This doesn't
increase the latency, while making the call type-stable.

```julia
julia> D = Diagonal(1:2); C = similar(D);

julia> @time mul!(C, D, 1, 2, 2);
  0.248385 seconds (898.18 k allocations: 47.027 MiB, 12.30% gc time, 99.96% compilation time) # nightly
  0.249940 seconds (919.80 k allocations: 49.128 MiB, 11.36% gc time, 99.99% compilation time) # This PR
```
… causes deprecation warnings (JuliaLang#56306)

The current version of `subtypes` will throw deprecation errors even if
no one is using the deprecated bindings.

A similar bug was fixed in Aqua.jl -
https://github.com/JuliaTesting/Aqua.jl/pull/89/files

See discussion here: 

- JuliaIO/ImageMagick.jl#235 (for identifying
the problem)
- simonster/Reexport.jl#42 (for pointing to
the issue in Aqua.jl)
- https://github.com/JuliaTesting/Aqua.jl/pull/89/files (for the fix in
Aqua.jl)

This adds the `isbindingresolved` test to the `subtypes` function to
avoid throwing deprecation warnings. It also adds a test to check that
this doesn't happen.

---

On the current master branch (before the fix), the added test shows: 
 
```
WARNING: using deprecated binding InternalModule.MyOldType in OuterModule.
, use MyType instead.
Subtypes and deprecations: Test Failed at /home/dgleich/devextern/julia/usr/share/julia/stdlib/v1.12/Test/src/Test.jl:932
  Expression: isempty(stderr_content)
   Evaluated: isempty("WARNING: using deprecated binding InternalModule.MyOldType in OuterModule.\n, use MyType instead.\n")
Test Summary:             | Fail  Total  Time
Subtypes and deprecations |    1      1  2.8s
ERROR: LoadError: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.
in expression starting at /home/dgleich/devextern/julia/stdlib/InteractiveUtils/test/runtests.jl:841
ERROR: Package InteractiveUtils errored during testing
```

---

Using the results of this pull request:

```
@test_nowarn subtypes(Integer);
```

passes without error. The other tests pass too.
This is a similar PR to JuliaIO/CRC32.jl#12

I added a generic fallback method for `AbstractVector{UInt8}` similar to
the existing generic `IO` method.

Co-authored-by: Steven G. Johnson <stevenj@mit.edu>
It makes a big difference when displaying strings that have width-2 or
width-0 characters.
…Lang#55587)

This simplifies the `finish_stage` rule.

Co-authored-by: Zentrik <Zentrik@users.noreply.github.com>
…g#56243)

These are safer in general, as well as easier to read.

Also, narrow the scopes of some `@inbounds` annotations.
Currently the following code snippet results in an internal error:
```julia
julia> func(x) = @atomic :monotonic x[].count += 1;

julia> let;Base.Experimental.@force_compile
           x = Ref(nothing)
           func(x)
       end
Internal error: during type inference of
...
```

This issue is caused by the incorrect use of `_fieldtype_tfunc(𝕃, o, f)`
within `modifyfield!_tfunc`, specifically because `o` should be
`widenconst`ed, but it isn’t. By using `_fieldtype_tfunc` correctly, we
can avoid the error through error-catching in `abstract_modifyop!`. This
commit also includes a similar fix for `replacefield!_tfunc` as well.
In `InferenceState` the lhs of a `:=` expression should only contain
`GlobalRef` or `SlotNumber` and no other IR elements. Currently when
`SSAValue` appears in `lhs`, the invalid assignment effect is somehow
ignored, but this is incorrect anyway, so this commit removes that
check. Since `SSAValue` should not appear in `lhs` in the first place,
this is not a significant change though.
Fixes one part of JuliaLang#54636 

It was only safe to use the following if `from.data` was a dense vector
of bytes.
```julia
GC.@preserve from unsafe_copyto!(p, pointer(from.data, from.ptr), adv)
```

This PR adds a fallback suggested by @matthias314 in
https://discourse.julialang.org/t/copying-bytes-from-abstractvector-to-ptr/119408/7
…ol` (JuliaLang#56316)

Also just as a minor backedge reduction optimization, this commit avoids
adding backedges when `applicable` is inferred to return `::Bool`.
…liaLang#56196)

The discussion here mentions `require_one_based_indexing` being part of
the public API: JuliaLang#43263

Both functions are also documented (albeit in the dev docs): 
* `require_one_based_indexing`:
https://docs.julialang.org/en/v1/devdocs/offset-arrays/#man-custom-indices
* `has_offset_axes`:
https://docs.julialang.org/en/v1/devdocs/offset-arrays/#For-objects-that-mimic-AbstractArray-but-are-not-subtypes

Towards JuliaLang#51335.

---------

Co-authored-by: Matt Bauman <mbauman@gmail.com>
…liaLang#56324)

Functions that are meant for package developers may go here, instead of
the main section that is primarily for users.
…ecompile loading (JuliaLang#56291)

Fixes `_require_search_from_serialized` to first acquire all
start_loading locks (using a deadlock-free batch-locking algorithm)
before doing stalechecks and the rest, so that all the global
computations happen behind the require_lock, then the rest can happen
behind module-specific locks, then (as before) extensions can be loaded
in parallel eventually after `require` returns.
A more targeted fix of JuliaLang#54369 than JuliaLang#54372

Preserves the performance improvements added in JuliaLang#53962 by creating a new
internal `_unsafe_takestring!(v::Memory{UInt8})` function that does what
`String(::Memory{UInt8})` used to do.
jakobnissen and others added 21 commits November 27, 2024 22:13
Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com>
…uliaLang#56668)

Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
This makes them usable for external consumers like GPUCompiler.jl.
I'm not sure what `` `cmd` `` could refer to, but it would make sense to
refer to `` `str` `` in this case. I'm assuming it's a typo.
…56702)

These are not user-visible, so this makes the compiler faster and more
efficient with no effort on our part, and avoids duplicating the
debug_level parameter.
This helps when profiling remotely since VTunes doesn't support
setting environment variables on remote systems.

Will still respect `ENABLE_JITPROFILING=0`.
A `TAGGED_RELEASE_BANNER` with spaces such as `Official
https://julialang.org release` produces the error
`/cache/build/builder-amdci4-5/julialang/julia-master/deps/tools/jlchecksum:
66: [: Official: unexpected operator`.
…tation (JuliaLang#56727)

Fixes JuliaLang#56680. This PR updates the documentation for the ispunct function
in Julia to explicitly note its differing behavior from the similarly
named function in C.

---------

Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
When we declare inner methods, e.g. the `f` in

```
function fs()
   f(lhs::Integer) = 1
   f(lhs::Integer, rhs::(local x=Integer; x)) = 2
   return f
end
```

we must hoist the definition of the (appropriately mangled) generic
function `f` to top-level, including all variables that were used in the
signature definition of `f`. This situation is a bit unique in the
language because it uses inner function scope, but gets executed in
toplevel scope. For example, you're not allowed to use a local of the
inner function in the signature definition:

```
julia> function fs()
          local x=Integer
          f(lhs::Integer, rhs::x) = 2
          return f
       end
ERROR: syntax: local variable x cannot be used in closure declaration
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1
```

In particular, the restriction is signature-local:
```
julia> function fs()
          f(rhs::(local x=Integer; x)) = 1
          f(lhs::Integer, rhs::x) = 2
          return f
       end
ERROR: syntax: local variable x cannot be used in closure declaration
Stacktrace:
 [1] top-level scope
   @ REPL[4]:1
```

There's a special intermediate form `moved-local` that gets generated
for this definition. In c6c3d72, this
form stopped getting generated for certain inner methods. I suspect this
happened because of the incorrect assumption that the set of moved
locals is being computed over all signatures, rather than being a
per-signature property.

The result of all of this was that this is one of the few places where
lowering still generated a symbol as the lhs of an assignment for a
global (instead of globalref), because the code that generates the
assignment assumes it's a local, but the later pass doesn't know this.
Because we still retain the code for this from before we started using
globalref consistently, this wasn't generally causing a problems, except
possibly leaking a global (or potentially assigning to a global when
this wasn't intended). However, in follow on work, I want to make use of
knowing whether the LHS is a global or local in lowering, so this was
causing me trouble.

Fix all of this by putting back the `moved-local` where it was dropped.

Fixes JuliaLang#56711
This is an alternative mechanism to JuliaLang#56650 that largely achieves the
same result, but by hooking into `invoke` rather than a generated
function. They are orthogonal mechanisms, and its possible we want both.
However, in JuliaLang#56650, both Jameson and Valentin were skeptical of the
generated function signature bottleneck. This PR is sort of a hybrid of
mechanism in JuliaLang#52964 and what I proposed in
JuliaLang#56650 (comment).

In particular, this PR:

1. Extends `invoke` to support a CodeInstance in place of its usual
`types` argument.

2. Adds a new `typeinf` optimized generic. The semantics of this
optimized generic allow the compiler to instead call a companion
`typeinf_edge` function, allowing a mid-inference interpreter switch
(like JuliaLang#52964), without being forced through a concrete signature
bottleneck. However, if calling `typeinf_edge` does not work (e.g.
because the compiler version is mismatched), this still has well defined
semantics, you just don't get inference support.

The additional benefit of the `typeinf` optimized generic is that it
lets custom cache owners tell the runtime how to "cure" code instances
that have lost their native code. Currently the runtime only knows how
to do that for `owner == nothing` CodeInstances (by re-running
inference). This extension is not implemented, but the idea is that the
runtime would be permitted to call the `typeinf` optimized generic on
the dead CodeInstance's `owner` and `def` fields to obtain a cured
CodeInstance (or a user-actionable error from the plugin).

This PR includes an implementation of `with_new_compiler` from JuliaLang#56650.
This PR includes just enough compiler support to make the compiler
optimize this to the same code that JuliaLang#56650 produced:

```
julia> @code_typed with_new_compiler(sin, 1.0)
CodeInfo(
1 ─      $(Expr(:foreigncall, :(:jl_get_tls_world_age), UInt64, svec(), 0, :(:ccall)))::UInt64
│   %2 =   builtin Core.getfield(args, 1)::Float64
│   %3 =    invoke sin(%2::Float64)::Float64
└──      return %3
) => Float64
```

However, the implementation here is extremely incomplete. I'm putting it
up only as a directional sketch to see if people prefer it over JuliaLang#56650.
If so, I would prepare a cleaned up version of this PR that has the
optimized generics as well as the curing support, but not the full
inference integration (which needs a fair bit more work).
…6713)

This adjusts lowering to emit `setglobal!` for assignment to globals,
thus making the `=` expr head used only for slots in `CodeInfo` and
entirely absent in `IRCode`. The primary reason for this is just to
reduce the number of special cases that compiler passes have to reason
about. In IRCode, `=` was already essentially equivalent to
`setglobal!`, so there's no good reason not to canonicalize.

Finally, the `=` syntax form for globals already gets recognized
specially to insert `convert` calls to their declared binding type, so
this doesn't impose any additional requirements on lowering to
distinguish local from global assignments. In general, I'd also like to
separate syntax and intermediate forms as much as possible where their
semantics differ, which this accomplises by just using the builtin.

This change is mostly semantically invisible, except that spliced-in
GlobalRefs now declare their binding because they are indistinguishable
from ordinary assignments at the stage where I inserted the lowering. If
we want to, we can preserve the difference, but it'd be a bit more
annoying for not much benefit, because:
1. The spliced in version was only recently made to work anyway, and
2. The semantics of when exactly bindings are declared is still messy on
master anyway and will get tweaked shortly in further binding partitions
work.
… the REPL (JuliaLang#54800)

When a user requests a completion for a backslash shortcode, this PR
adds the glyphs for all the suggestions to the output. This makes it
much easier to find the result one is looking for, especially if the
user doesn't know all latex and emoji specifiers by heart.

Before:

<img width="813" alt="image"
src="https://github.com/JuliaLang/julia/assets/22495855/bf651399-85a6-4677-abdc-c66a104e3b89">

After:

<img width="977" alt="image"
src="https://github.com/JuliaLang/julia/assets/22495855/04c53ea2-318f-4888-96eb-0215b49c10f3">

---------

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
@udesou udesou force-pushed the updating-dev-2590e675 branch from 2f7ed15 to 4df5661 Compare December 6, 2024 00:55
@udesou udesou force-pushed the updating-dev-2590e675 branch from 3baac3e to a9d0fe3 Compare December 6, 2024 01:37
@udesou udesou requested a review from qinsoon December 6, 2024 04:54
@udesou udesou merged commit e026623 into mmtk:dev Dec 6, 2024
3 of 4 checks passed
udesou added a commit to mmtk/mmtk-julia that referenced this pull request Dec 9, 2024
Updating the `dev` branch of `mmtk/julia` to
JuliaLang/julia@2590e67.

Merge with mmtk/julia#77.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.