Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major slowdown on Linux due to Gtk.jl + @threads for #35552

Closed
timholy opened this issue Apr 22, 2020 · 15 comments · Fixed by JuliaGraphics/Gtk.jl#630
Closed

Major slowdown on Linux due to Gtk.jl + @threads for #35552

timholy opened this issue Apr 22, 2020 · 15 comments · Fixed by JuliaGraphics/Gtk.jl#630
Labels
multithreading Base.Threads and related functionality performance Must go faster regression Regression in behavior compared to a previous version

Comments

@timholy
Copy link
Member

timholy commented Apr 22, 2020

xref JuliaGraphics/Gtk.jl#503, which seems likely to be a Julia issue. Note that I observe it on both Julia 1.4 and 1.2, but Julia 1.0.5 does not share this problem.

@timholy timholy added multithreading Base.Threads and related functionality regression Regression in behavior compared to a previous version labels Apr 22, 2020
@JeffBezanson
Copy link
Member

Could you grab a profile?
Some of the new threading and task code was added in 1.2, so that makes sense.

@timholy
Copy link
Member Author

timholy commented Apr 22, 2020

Bizarrely, on a relatively recent master (8d8f46d) it's always fast if I use @profile:

julia> @profile test_imfilter(opt=2)
julia_num_threads=4
1 imfilter 0.019901 sec
2 imfilter 0.029619 sec
3 imfilter 0.021065 sec
4 imfilter 0.01536 sec
5 imfilter 0.013495 sec
6 imfilter 0.009861 sec
7 imfilter 0.010458 sec
8 imfilter 0.009338 sec
9 imfilter 0.020186 sec

julia> test_imfilter(opt=2)
julia_num_threads=4
1 imfilter 5.022134 sec
2 imfilter 0.034472 sec
3 imfilter 5.020989 sec
4 imfilter 5.032051 sec
5 imfilter 0.036996 sec
6 imfilter 0.017211 sec
7 imfilter 0.018399 sec
8 imfilter 0.01314 sec
9 imfilter 0.013678 sec

julia> @profile test_imfilter(opt=2)
julia_num_threads=4
1 imfilter 0.029225 sec
2 imfilter 0.010284 sec
3 imfilter 0.008963 sec
4 imfilter 0.009983 sec
5 imfilter 0.007118 sec
6 imfilter 0.012708 sec
7 imfilter 0.009843 sec
8 imfilter 0.009938 sec
9 imfilter 0.010068 sec

julia> test_imfilter(opt=2)
julia_num_threads=4
1 imfilter 0.027196 sec
2 imfilter 5.015825 sec
3 imfilter 0.032341 sec
4 imfilter 5.017424 sec
5 imfilter 0.02489 sec
6 imfilter 5.017529 sec
7 imfilter 5.018003 sec
8 imfilter 0.019033 sec
9 imfilter 5.024461 sec

If I use a low-tech sampling profiler (hit Ctrl-C and see where the break occurs), it always seems to be here:

ERROR: InterruptException:
Stacktrace:
 [1] try_yieldto(::typeof(Base.ensure_rescheduled), ::Base.RefValue{Task}) at ./task.jl:645
 [2] wait at ./task.jl:710 [inlined]
 [3] wait(::Base.GenericCondition{Base.Threads.SpinLock}) at ./condition.jl:106
 [4] _wait(::Task) at ./task.jl:238
 [5] wait(::Task) at ./task.jl:265
 [6] macro expansion at ./threadingconstructs.jl:69 [inlined]
 [7] _imfilter_tiled_threads!(::CPU1{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::NoPad{Pad{2}}, ::Array{Tuple{UnitRange{Int64},UnitRange{Int64}},1}, ::Array{Array{Float32,2},1}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:858
 [8] _imfilter_tiled!(::CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::NoPad{Pad{2}}, ::Array{Array{Float32,2},1}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:854
 [9] imfilter!(::CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::NoPad{Pad{2}}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:773
 [10] imfilter! at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:770 [inlined]
 [11] imfilter! at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:715 [inlined]
 [12] imfilter!(::CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:702
 [13] imfilter!(::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}, ::ImageFiltering.Algorithm.FIRTiled{2}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:612
 [14] imfilter!(::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}) at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:606
 [15] imfilter at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:27 [inlined]
 [16] imfilter at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:22 [inlined]
 [17] imfilter at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:18 [inlined]
 [18] imfilter at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:10 [inlined]
 [19] imfilter at /home/tim/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:5 [inlined]
 [20] test_imfilter(; opt::Int64) at ./REPL[4]:11
 [21] top-level scope at REPL[23]:1

The threading is just old-school @threads, see here.

@JeffBezanson
Copy link
Member

If I use a low-tech sampling profiler (hit Ctrl-C and see where the break occurs),

One of my favorites! 😄

@JeffBezanson
Copy link
Member

The key difference in v1.2 must be that @threads loops join at the end by calling the normal wait on tasks, which allows other scheduled tasks to run as well. Does GTK have a task scheduled that might block for 5 seconds?

@lobingera
Copy link

What do you mean?

a) Is it possible that GTK spends 5s on internal processing i.e. redrawing something?
b) Is is possible the GTK calls something that takes 5s?

I mentioned this JuliaGraphics/Gtk.jl#325 (comment) , just putting the gtk main loop into a Task might not be enough. There seems to be a reason for this page https://developer.gnome.org/gtk3/stable/gtk3-General.html

@timholy
Copy link
Member Author

timholy commented Apr 23, 2020

I am no Gtk expert, but I don't think there's anything in Gtk that should take that long. Assuming that times are being measured in milliseconds, there is something which appears to be a 5s timeout. See also here. I confess that I have not dug into libuv or Julia's scheduling enough to make much sense of what's going on there.

@roflmaostc
Copy link
Contributor

roflmaostc commented Oct 23, 2020

Hey,

I was wondering whether there are some efforts being done to fix this?
I encountered (probably?) the same issue in mcabbott/Tullio.jl#45 and Julia 1.5.2.
But for me that 5s don't appear. It's just a slowdown.

Thanks,

Felix

@rafaqz
Copy link
Contributor

rafaqz commented Nov 28, 2020

I just got bitten by this as well. It's always roughly a 5 second timeout in @threads after using Gtk.

It seems like it's literally a timeout - there is nothing significant happening on any CPUs for the duration, and the timing seems too similar across multiple machines to be an actual task.

@wavecast
Copy link

Just confirming what @timholy mentioned above. I tried changing 5000 to 1 in this line of Gtk.jl/src/GLib/signals.jl:

tmout_min::Cint = (uv_pollfd::_GPollFD).fd == -1 ? 10 : 5000

After doing that a multithreaded test program (just JuliaImages/ImageView.jl#243 with 'using ImageView' changed to 'using Gtk') runs much faster. Changing 5000 to 0 makes it even faster but CPU goes to 100%.

@IanButterworth
Copy link
Member

This gets in the way of the new Profiling thread support given ProfileView uses Gtk.

So if a user does

julia> using ProfileView

julia> @profview multithreaded_foo()

they're going to have a bad time.

It can be worked around once, by doing

julia> using Profile

julia> @profile multithreaded_foo()

julia> using ProfileView

julia> ProfileView.view()

but that's awkward and only works once per session.

Therefore as a suggestion I'm putting it on the 1.8 milestone

@IanButterworth
Copy link
Member

JuliaGraphics/Gtk.jl#607 is merged and released, which makes this a bit more manageable by stopping the eventloop when no Gtk windows are shown

@tkf tkf changed the title Major slowdown on Linux due to Gtk.jl + multithreading Major slowdown on Linux due to Gtk.jl + @threads for Jan 16, 2022
@tkf
Copy link
Member

tkf commented Jan 16, 2022

I tried to refine the issue title as it looks like this is specific to @threads for and has nothing/little to do with our multi-threaded runtime itself, IIUC. But please feel free to roll it back if that's inaccurate.

@roflmaostc
Copy link
Contributor

roflmaostc commented Dec 12, 2022

Hm.

I have kind of bad news.
In my case I observe the major slowdown again.

I don't have a MWE yet, but after using ImagView and imshow(x) I observe significant performance drop in my iterative FFT algorithm. CPU usage drops from 4 full used threads to 50% on one core.

@roflmaostc
Copy link
Contributor

roflmaostc commented Dec 13, 2022

Ok, found one, the difference is roughly a factor of 30:

julia> using FFTW

julia> FFTW.set_num_threads(4)

julia> x = rand(300, 300);

julia> using BenchmarkTools

julia> @btime fft($x);
  349.117 μs (111 allocations: 2.75 MiB)

julia> using ImageView

julia> @btime fft($x);
  361.026 μs (111 allocations: 2.75 MiB)

julia> imshow(x)
Dict{String, Any} with 4 entries:
  "gui"         => Dict{String, Any}("window"=>GtkWindowLeaf(name="", parent, width-request=-1, height-request=-1, visible=TRUE, sensitive=TRUE, app-paintable=FALSE, can-focus=FALS
  "roi"         => Dict{String, Any}("redraw"=>Observables.ObserverFunction[ObserverFunction defined at /home/fxw/.julia/packages/Observables/PHGQ8/src/Observables.jl:414 operating
  "annotations" => Observable(Dict{UInt64, Any}())
  "clim"        => Observable(CLim{Float64}(7.34195e-6, 0.999971))

julia> @btime fft($x);
  10.860 ms (112 allocations: 2.75 MiB)

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 5600X 6-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver3)
  Threads: 12 on 12 virtual cores
Environment:
  JULIA_NUM_THREADS = 12

(jl_Qyr6ui) pkg> st
Status `/tmp/jl_Qyr6ui/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.2
  [7a1cc6ca] FFTW v1.5.0
  [86fae568] ImageView v0.11.1

For larger array, the difference seems to be smaller.

julia> using BenchmarkTools, FFTW

julia> x = rand(512, 512, 128);

julia> @btime fft($x);
  1.152 s (31 allocations: 1.00 GiB)

julia> FFTW.set_num_threads(4)

julia> @btime fft($x);
  408.174 ms (113 allocations: 1.00 GiB)

julia> using ImageView

julia> @btime fft($x);
  408.752 ms (112 allocations: 1.00 GiB)

julia> imshow(x)
Dict{String, Any} with 4 entries:
  "gui"         => Dict{String, Any}("window"=>GtkWindowLeaf(name="", parent, width-reque
  "roi"         => Dict{String, Any}("redraw"=>Observables.ObserverFunction[ObserverFunct
  "annotations" => Observable(Dict{UInt64, Any}())
  "clim"        => Observable(CLim{Float64}(6.91704e-6, 0.999992))

julia> @btime fft($x);
  422.323 ms (116 allocations: 1.00 GiB)

@KristofferC KristofferC reopened this Dec 13, 2022
@JeffBezanson JeffBezanson removed this from the 1.8 milestone Aug 2, 2023
@oscardssmith
Copy link
Member

what closed this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.