Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve printing of several arguments #55754

Merged
merged 5 commits into from
Sep 20, 2024
Merged

Conversation

gdalle
Copy link
Contributor

@gdalle gdalle commented Sep 12, 2024

Following a discussion on Discourse, this PR tries to improve print (and variants) for more than one argument.
The idea is that for is type-unstable over the tuple args, while foreach unrolls. I have no idea how to properly evaluate the potential performance gains though.

@Tortar
Copy link
Contributor

Tortar commented Sep 12, 2024

To test if it works I guess you could execute https://github.com/shagrouni/langs_string_build_test/blob/main/src/jl_strbld.jl with julia jl_strbld.jl 50000000 and see if it now works faster, it should be hopefully as fast as using

    ...
    for i in 1:num
        print(buf, " J ")
        print(buf, i)
    end
    ...

@gdalle
Copy link
Contributor Author

gdalle commented Sep 12, 2024

Doesn't look half bad.

using BenchmarkTools

function test_slow(n::Integer)
    s = "hello"
    buf = IOBuffer()
    for i in 1:n
        print(buf, s, i)
    end
end

function test_fast(n::Integer)
    s = "hello"
    buf = IOBuffer()
    for i in 1:n
        print(buf, s)
        print(buf, i)
    end
end

@btime test_slow(50_000);
@btime test_fast(50_000);

Current nightly

julia> @btime test_slow(50_000);
  5.376 ms (249003 allocations: 7.35 MiB)

julia> @btime test_fast(50_000);
  2.959 ms (100025 allocations: 4.31 MiB)

This PR

julia> @btime test_slow(50_000);
  3.503 ms (100025 allocations: 4.31 MiB)

julia> @btime test_fast(50_000);
  3.018 ms (100025 allocations: 4.31 MiB)

@stevengj
Copy link
Member

stevengj commented Sep 12, 2024

Ideally, test_slow and test_fast should now be equivalent.

@tkf, shouldn't foreach on a 2-tuple now be equivalent to a fully unrolled loop thanks to #31901? I suspect that there is some compiler specialization that is still missing.

(That doesn't need to block this PR, however — it's still faster, even if there are some opportunities for additionally speeding up foreach.)

@gdalle
Copy link
Contributor Author

gdalle commented Sep 12, 2024

I think Takafumi is no longer around?

@Tortar
Copy link
Contributor

Tortar commented Sep 12, 2024

Does the technique have any downside @stevengj such as increasing compile time substantially? Because otherwise this can probably be applied in other Vararg cases right?

@JeffBezanson JeffBezanson added performance Must go faster display and printing Aesthetics and correctness of printed representations of objects. labels Sep 12, 2024
@nsajko
Copy link
Contributor

nsajko commented Sep 13, 2024

To make this PR complete, this:

julia/base/strings/io.jl

Lines 141 to 144 in 467ab85

siz::Int = 0
for x in xs
siz += _str_sizehint(x)
end

... and this:

julia/base/strings/io.jl

Lines 157 to 160 in 467ab85

siz::Int = 0
for x in xs
siz += _str_sizehint(x)
end

... should probably be deduplicated and reimplemented using mapreduce. Something like:

function _str_sizehints(xs...)
    mapreduce(_str_sizehint, +, xs; init = 0)
end

@gdalle
Copy link
Contributor Author

gdalle commented Sep 13, 2024

Done!

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I can reproduce those benchmarks and appreciate the performance improvement.

Does the technique have any downside stevengj such as increasing compile time substantially?

Compile time of test_slow does increase from 5ms to 9ms:

julia> compile_time(version) = parse(Float64, strip(read(`julia +$version -E 'function test_slow(n::Integer)
               s = "hello"
               buf = IOBuffer()
               for i in 1:n
                   print(buf, s, i)
               end
           end; @elapsed @eval test_slow(100)'`, String)))
compile_time (generic function with 1 method)

julia> [compile_time("nightly") for _ in 1:7]
7-element Vector{Float64}:
 0.005005201
 0.004923159
 0.004996992
 0.005025118
 0.004929617
 0.004938451
 0.004956284

julia> [compile_time("pr55754") for _ in 1:7]
7-element Vector{Float64}:
 0.009146896
 0.009284273
 0.009325356
 0.009385732
 0.009336814
 0.009289814
 0.009340064

shell> julia +nightly -E 'Base.GIT_VERSION_INFO.commit'
"467ab852bb7392843ce74a17de89c221e0f64df0"

shell> julia +pr55754 -E 'Base.GIT_VERSION_INFO.commit'
"b47a4940dc130e50795254f877a17a766be7d2fc"

@nsajko
Copy link
Contributor

nsajko commented Sep 13, 2024

The _str_sizehints changes sadly break bootstrap for the assert builds.

gmp.jl
ryu/Ryu.jl
ryu/utils.jl
ryu/shortest.jl
Internal error: during type inference of
string(Expr)
Encountered unexpected error in runtime:
AssertionError(msg="irinterp is unable to handle heavy recursion")
collect_limitations! at ./compiler/ssair/irinterp.jl:4 [inlined]
from_interprocedural! at ./compiler/abstractinterpretation.jl:392 [inlined]

@gdalle
Copy link
Contributor Author

gdalle commented Sep 13, 2024

The _str_sizehints changes sadly break bootstrap for the assert builds.

@nsajko what does that mean in layperson terms?

@nsajko
Copy link
Contributor

nsajko commented Sep 13, 2024

The bootstrap (the initial compilation of Base and other code) fails because the compiler refuses to handle deep recursion. Or something like that.

In practice that probably means you have to revert the commit I suggested (I think the build was successful for the initial commit in the PR, right?). I guess we could try reintroducing the mapreduce after #55575 gets merged.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 13, 2024

This adds a dynamic closure capture though and dispatch, which may not be performant

@JeffBezanson
Copy link
Sponsor Member

That error isn't supposed to happen; you may have found an interesting bug :)

@JeffBezanson
Copy link
Sponsor Member

If you update this, please keep a branch around with that failure so we can work on it.

@gdalle
Copy link
Contributor Author

gdalle commented Sep 13, 2024

@vtjnash I think it comes down to a compromise between tuple unrolling (which is possible with foreach but not with for) and the additional closure required. My PR might be slower for printing tuples with identically typed arguments for instance. I just wanted to experiment with Steven's suggestion but I don't have strong feeling either way, so I'll let the knowledgeable people hash it out.

base/strings/io.jl Outdated Show resolved Hide resolved
base/strings/io.jl Outdated Show resolved Hide resolved
base/strings/io.jl Outdated Show resolved Hide resolved
base/strings/io.jl Outdated Show resolved Hide resolved
base/strings/io.jl Outdated Show resolved Hide resolved
Co-authored-by: Steven G. Johnson <stevenj@mit.edu>
base/strings/io.jl Outdated Show resolved Hide resolved
@stevengj
Copy link
Member

What does the benchmark look like with the latest Fix1 version? Is it matching the manually unrolled 2-print-call version yet?

Co-authored-by: Steven G. Johnson <stevenj@mit.edu>
@gdalle
Copy link
Contributor Author

gdalle commented Sep 19, 2024

No, the same difference remains.

Currently nightly

julia> @btime test_slow(50_000);
  5.245 ms (249003 allocations: 7.35 MiB)

julia> @btime test_fast(50_000);
  2.801 ms (100025 allocations: 4.31 MiB)

This PR

julia> @btime test_slow(50_000);
  3.622 ms (100025 allocations: 4.31 MiB)

julia> @btime test_fast(50_000);
  2.925 ms (100025 allocations: 4.31 MiB)

@LilithHafner
Copy link
Member

Compile time regression is also gone!

julia> compile_time(version) = parse(Float64, strip(read(`julia +$version -E 'function test_slow(n::Integer)
               s = "hello"
               buf = IOBuffer()
               for i in 1:n
                   print(buf, s, i)
               end
           end; @elapsed @eval test_slow(100)'`, String)))
compile_time (generic function with 1 method)

julia> [compile_time("nightly") for _ in 1:7]
Juliaup configuration is locked by another process, waiting for it to unlock.
Juliaup configuration is locked by another process, waiting for it to unlock.
Juliaup configuration is locked by another process, waiting for it to unlock.
7-element Vector{Float64}:
 0.006195663
 0.005031322
 0.005418033
 0.005152656
 0.005105739
 0.005199073
 0.005210907

julia> [compile_time("pr55754") for _ in 1:7]
7-element Vector{Float64}:
 0.008983305
 0.005014739
 0.004937614
 0.005452533
 0.005186157
 0.005184865
 0.005381825

shell> julia +nightly -E 'Base.GIT_VERSION_INFO.commit'
"58b239c5b8eaffec3b9b99e6d7c37e8ae6129d6d"

shell> julia +pr55754 -E 'Base.GIT_VERSION_INFO.commit'
"91443de63550381d4d95d7b992e3d9de8206e8c3"

@gdalle
Copy link
Contributor Author

gdalle commented Sep 20, 2024

Anything more I need to do?

@LilithHafner LilithHafner merged commit 44bef0d into JuliaLang:master Sep 20, 2024
5 of 7 checks passed
@LilithHafner
Copy link
Member

Nope! Thanks for the contribution :)

@gdalle gdalle deleted the patch-3 branch September 20, 2024 16:08
@nsajko
Copy link
Contributor

nsajko commented Sep 20, 2024

@JeffBezanson requested that a branch with the buggy failing bootstrap be kept around, if I understood correctly. Is such a branch still available somewhere on Github?

@gdalle gdalle restored the patch-3 branch September 20, 2024 16:54
@gdalle
Copy link
Contributor Author

gdalle commented Sep 20, 2024

Yes, it's here: https://github.com/JuliaLang/julia/tree/b59efec2ab52690729e84bc2939310be499db4ad

vtjnash added a commit that referenced this pull request Sep 26, 2024
vtjnash added a commit that referenced this pull request Sep 27, 2024
Reverts #55754 as it overrode some performance heuristics
which appeared to be giving a significant gain/loss in performance:
Closes #55893
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
display and printing Aesthetics and correctness of printed representations of objects. performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants