Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use SnoopPrecompile #1234

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

use SnoopPrecompile #1234

wants to merge 8 commits into from

Conversation

ranocha
Copy link
Member

@ranocha ranocha commented Oct 7, 2022

This uses some new tools for precompilation. I get the following results with Julia v1.8.2 for the workload

@time @eval(using OrdinaryDiffEq); @time @eval(using Trixi); @time trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_taylor_green_vortex.jl"), maxiters=200)

The timings are in seconds on my workstation (for the first commit in this PR).

main this PR
using OrdinaryDiffEq ca. 6 ca. 6
using Trixi ca. 16 ca. 18
trixi_include(...) ca. 40 ca. 35
precompiling Trixi ca. 27 ca. 43

Second commit (with updated dependency versions, after merge of main):

main this PR
using OrdinaryDiffEq ca. 7 ca. 7
using Trixi ca. 16 ca. 20
trixi_include(...) ca. 40 ca. 25
precompiling Trixi ? ca. 52

TODO

Before we merge this, we should

  • check the influence on something like MHD with a different number of variables
  • run our benchmark suite
  • decide whether we want to keep the old precompilation setup for Julia < 1.8 and enable the new setup only for Julia >= 1.8

@codecov
Copy link

codecov bot commented Oct 31, 2022

Codecov Report

Merging #1234 (ca6b007) into main (514ef46) will increase coverage by 9.98%.
The diff coverage is n/a.

❗ Current head ca6b007 differs from pull request most recent head 4e83d0c. Consider uploading reports for the commit 4e83d0c to get more accurate results

@@            Coverage Diff             @@
##             main    #1234      +/-   ##
==========================================
+ Coverage   87.50%   97.48%   +9.98%     
==========================================
  Files         329      325       -4     
  Lines       25186    24921     -265     
==========================================
+ Hits        22037    24292    +2255     
+ Misses       3149      629    -2520     
Flag Coverage Δ
unittests 97.48% <ø> (+9.98%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Trixi.jl 70.00% <ø> (ø)
src/equations/compressible_euler_2d.jl 97.43% <0.00%> (-1.58%) ⬇️
src/visualization/recipes_makie.jl 96.36% <0.00%> (-0.16%) ⬇️
src/equations/shallow_water_2d.jl 99.38% <0.00%> (-0.01%) ⬇️
src/solvers/dgsem_tree/dg_2d.jl 100.00% <0.00%> (ø)
src/solvers/dgsem_tree/indicators.jl 91.15% <0.00%> (ø)
src/equations/linear_scalar_advection_2d.jl 100.00% <0.00%> (ø)
examples/tree_2d_dgsem/elixir_euler_vortex.jl 100.00% <0.00%> (ø)
src/callbacks_step/euler_acoustics_coupling.jl 84.38% <0.00%> (ø)
examples/tree_2d_dgsem/elixir_euler_vortex_amr.jl 100.00% <0.00%> (ø)
... and 60 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@ranocha
Copy link
Member Author

ranocha commented Nov 2, 2022

Results from Rocinante:

1 Thread:

Job Properties

  • Time of benchmarks:
    • Target: 2 Nov 2022 - 09:05
    • Baseline: 2 Nov 2022 - 09:31
  • Package commits:
    • Target: ca6b00
    • Baseline: 3ea50d
  • Julia commits:
    • Target: 36034a
    • Baseline: 36034a
  • Julia command flags:
    • Target: -Cnative,-J/mnt/hd1/opt/julia/1.8.2/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1
    • Baseline: -Cnative,-J/mnt/hd1/opt/julia/1.8.2/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["tree_3d_dgsem/elixir_euler_mortar.jl", "p3_rhs!"] 1.02 (5%) 4.58 (1%) ❌
["tree_3d_dgsem/elixir_euler_mortar.jl", "p7_rhs!"] 1.10 (5%) ❌ 1.00 (1%)
["unstructured_2d_dgsem/elixir_euler_wall_bc.jl", "p3_analysis"] 0.85 (5%) ✅ 1.00 (1%)

2 Threads:

Job Properties

  • Time of benchmarks:
    • Target: 2 Nov 2022 - 10:00
    • Baseline: 2 Nov 2022 - 10:29
  • Package commits:
    • Target: ca6b00
    • Baseline: 3ea50d
  • Julia commits:
    • Target: 36034a
    • Baseline: 36034a
  • Julia command flags:
    • Target: -Cnative,-J/mnt/hd1/opt/julia/1.8.2/lib/julia/sys.so,-g1,--check-bounds=no,--threads=2
    • Baseline: -Cnative,-J/mnt/hd1/opt/julia/1.8.2/lib/julia/sys.so,-g1,--check-bounds=no,--threads=2
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["p4est_2d_dgsem/elixir_advection_extended.jl", "p3_analysis"] 0.94 (5%) ✅ 1.00 (1%)
["tree_2d_dgsem/elixir_advection_amr_nonperiodic.jl", "p3_rhs!"] 1.10 (5%) ❌ 1.00 (1%)
["tree_2d_dgsem/elixir_euler_vortex_mortar_shockcapturing.jl", "p7_rhs!"] 1.05 (5%) ❌ 1.00 (1%)
["tree_3d_dgsem/elixir_euler_mortar.jl", "p3_rhs!"] 1.03 (5%) 1.39 (1%) ❌
["unstructured_2d_dgsem/elixir_euler_wall_bc.jl", "p3_analysis"] 1.06 (5%) ❌ 1.00 (1%)

@ranocha
Copy link
Member Author

ranocha commented Dec 2, 2022

New timings (in seconds) obtained from

julia --project=. --threads=1 -e '
  @time @eval(using OrdinaryDiffEq); 
  @time @eval(using Trixi); 
  @time @eval trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_taylor_green_vortex.jl"), maxiters=200)'

and

julia --project=. --threads=1 -e '
  @time @eval(using OrdinaryDiffEq); 
  @time @eval(using Trixi); 
  @time @eval trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_mhd_alfven_wave.jl"), maxiters=200)'

(and faking a modification to Trixi to trigger precompilation in a separate Julia session, recording the time reported for ]precompile). main is at 94703d9.

Julia v1.8.3

main this PR
precompiling Trixi ca. 28 ca. 52
using OrdinaryDiffEq ca. 8 ca. 8
using Trixi ca. 17 ca. 20.5
trixi_include(...elixir_euler_taylor_green_vortex...) ca. 41 ca. 26
trixi_include(...elixir_mhd_alfven_wave...) ca. 43 ca. 39

So it looks like it pays off if you are mainly interested in compressible Euler and do not need to precompile Trixi again every time you start a new session.

@ranocha
Copy link
Member Author

ranocha commented Dec 2, 2022

@sloede Benchmarks on Rocinante are running. Do we want to wait for them and decide how to proceed with this PR based on the results?

@ranocha
Copy link
Member Author

ranocha commented Dec 2, 2022

For reference, here is the flamegraph of

tinf = @snoopi_deep trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_taylor_green_vortex.jl"), maxiters=200)

in this PR

Screenshot_tinf

  • The show stuff etc. is a bit hard to precompile unless we use @nospecialize on all arguments (since we do not want to clutter the console with output during precompilation.
  • We might shave off some of the solve stuff with better integration and precompilation of OrdinaryDiffEq.jl, see wrap RHS to reduce latency with OrdinaryDiffEq.jl #1255
  • We just need the numbers of nodes and variables in create_cache_analysis. Maybe this could be used to insert another call in the chain, avoiding specializiation on all of dg.
  • Quite a lot of stuff is just not inference (which is the only work that can currently be saved during precompilation).

@ranocha
Copy link
Member Author

ranocha commented Dec 2, 2022

Results from Rocinante:

1 Thread:

Job Properties

  • Time of benchmarks:
    • Target: 2 Dec 2022 - 17:38
    • Baseline: 2 Dec 2022 - 18:08
  • Package commits:
    • Target: 4e83d0
    • Baseline: 94703d
  • Julia commits:
    • Target: 0434de
    • Baseline: 0434de
  • Julia command flags:
    • Target: -Cnative,-J/mnt/hd1/opt/julia/1.8.3/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1
    • Baseline: -Cnative,-J/mnt/hd1/opt/julia/1.8.3/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["latency", "polydeg_7"] 1.05 (5%) ❌ 1.00 (1%)
["tree_3d_dgsem/elixir_euler_mortar.jl", "p3_rhs!"] 1.00 (5%) 4.58 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["benchmark/elixir_2d_euler_vortex_p4est.jl"]
  • ["benchmark/elixir_2d_euler_vortex_structured.jl"]
  • ["benchmark/elixir_2d_euler_vortex_tree.jl"]
  • ["benchmark/elixir_2d_euler_vortex_unstructured.jl"]
  • ["latency"]
  • ["p4est_2d_dgsem/elixir_advection_extended.jl"]
  • ["p4est_3d_dgsem/elixir_advection_basic.jl"]
  • ["structured_2d_dgsem/elixir_advection_extended.jl"]
  • ["structured_2d_dgsem/elixir_advection_nonperiodic.jl"]
  • ["structured_2d_dgsem/elixir_euler_ec.jl"]
  • ["structured_2d_dgsem/elixir_euler_source_terms_nonperiodic.jl"]
  • ["structured_2d_dgsem/elixir_mhd_ec.jl"]
  • ["structured_3d_dgsem/elixir_advection_nonperiodic_curved.jl"]
  • ["structured_3d_dgsem/elixir_euler_ec.jl"]
  • ["structured_3d_dgsem/elixir_euler_source_terms_nonperiodic_curved.jl"]
  • ["structured_3d_dgsem/elixir_mhd_ec.jl"]
  • ["tree_2d_dgsem/elixir_advection_amr_nonperiodic.jl"]
  • ["tree_2d_dgsem/elixir_advection_extended.jl"]
  • ["tree_2d_dgsem/elixir_euler_ec.jl"]
  • ["tree_2d_dgsem/elixir_euler_vortex_mortar.jl"]
  • ["tree_2d_dgsem/elixir_euler_vortex_mortar_shockcapturing.jl"]
  • ["tree_2d_dgsem/elixir_mhd_ec.jl"]
  • ["tree_3d_dgsem/elixir_advection_extended.jl"]
  • ["tree_3d_dgsem/elixir_euler_ec.jl"]
  • ["tree_3d_dgsem/elixir_euler_mortar.jl"]
  • ["tree_3d_dgsem/elixir_euler_shockcapturing.jl"]
  • ["tree_3d_dgsem/elixir_mhd_ec.jl"]
  • ["unstructured_2d_dgsem/elixir_euler_wall_bc.jl"]

Julia versioninfo

Target

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-132-generic #148-Ubuntu SMP Mon Oct 17 16:02:06 UTC 2022 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2195 MHz     830037 s       1284 s      31237 s  126067450 s          0 s
  Memory: 251.6267318725586 GB (242185.2109375 MB free)
  Uptime: 99173.72 sec
  Load Avg:  1.29  2.11  1.73
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 1 on 128 virtual cores

Baseline

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-132-generic #148-Ubuntu SMP Mon Oct 17 16:02:06 UTC 2022 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2170 MHz     847937 s       1284 s      32567 s  128343469 s          0 s
  Memory: 251.6267318725586 GB (242468.421875 MB free)
  Uptime: 100966.98 sec
  Load Avg:  1.0  1.0  1.08
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 1 on 128 virtual cores

2 Threads:

Job Properties

  • Time of benchmarks:
    • Target: 2 Dec 2022 - 18:41
    • Baseline: 2 Dec 2022 - 19:13
  • Package commits:
    • Target: 4e83d0
    • Baseline: 94703d
  • Julia commits:
    • Target: 0434de
    • Baseline: 0434de
  • Julia command flags:
    • Target: -Cnative,-J/mnt/hd1/opt/julia/1.8.3/lib/julia/sys.so,-g1,--check-bounds=no,--threads=2
    • Baseline: -Cnative,-J/mnt/hd1/opt/julia/1.8.3/lib/julia/sys.so,-g1,--check-bounds=no,--threads=2
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["latency", "polydeg_7"] 1.05 (5%) ❌ 1.00 (1%)
["structured_3d_dgsem/elixir_mhd_ec.jl", "p3_rhs!"] 0.87 (5%) ✅ 1.00 (1%)
["tree_2d_dgsem/elixir_advection_amr_nonperiodic.jl", "p3_rhs!"] 1.16 (5%) ❌ 1.00 (1%)
["tree_3d_dgsem/elixir_euler_mortar.jl", "p3_rhs!"] 0.98 (5%) 1.39 (1%) ❌
["unstructured_2d_dgsem/elixir_euler_wall_bc.jl", "p3_analysis"] 0.92 (5%) ✅ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["benchmark/elixir_2d_euler_vortex_p4est.jl"]
  • ["benchmark/elixir_2d_euler_vortex_structured.jl"]
  • ["benchmark/elixir_2d_euler_vortex_tree.jl"]
  • ["benchmark/elixir_2d_euler_vortex_unstructured.jl"]
  • ["latency"]
  • ["p4est_2d_dgsem/elixir_advection_extended.jl"]
  • ["p4est_3d_dgsem/elixir_advection_basic.jl"]
  • ["structured_2d_dgsem/elixir_advection_extended.jl"]
  • ["structured_2d_dgsem/elixir_advection_nonperiodic.jl"]
  • ["structured_2d_dgsem/elixir_euler_ec.jl"]
  • ["structured_2d_dgsem/elixir_euler_source_terms_nonperiodic.jl"]
  • ["structured_2d_dgsem/elixir_mhd_ec.jl"]
  • ["structured_3d_dgsem/elixir_advection_nonperiodic_curved.jl"]
  • ["structured_3d_dgsem/elixir_euler_ec.jl"]
  • ["structured_3d_dgsem/elixir_euler_source_terms_nonperiodic_curved.jl"]
  • ["structured_3d_dgsem/elixir_mhd_ec.jl"]
  • ["tree_2d_dgsem/elixir_advection_amr_nonperiodic.jl"]
  • ["tree_2d_dgsem/elixir_advection_extended.jl"]
  • ["tree_2d_dgsem/elixir_euler_ec.jl"]
  • ["tree_2d_dgsem/elixir_euler_vortex_mortar.jl"]
  • ["tree_2d_dgsem/elixir_euler_vortex_mortar_shockcapturing.jl"]
  • ["tree_2d_dgsem/elixir_mhd_ec.jl"]
  • ["tree_3d_dgsem/elixir_advection_extended.jl"]
  • ["tree_3d_dgsem/elixir_euler_ec.jl"]
  • ["tree_3d_dgsem/elixir_euler_mortar.jl"]
  • ["tree_3d_dgsem/elixir_euler_shockcapturing.jl"]
  • ["tree_3d_dgsem/elixir_mhd_ec.jl"]
  • ["unstructured_2d_dgsem/elixir_euler_wall_bc.jl"]

Julia versioninfo

Target

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-132-generic #148-Ubuntu SMP Mon Oct 17 16:02:06 UTC 2022 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2175 MHz     869765 s       1284 s      33956 s  130855580 s          0 s
  Memory: 251.6267318725586 GB (241254.19140625 MB free)
  Uptime: 102947.8 sec
  Load Avg:  1.25  1.21  1.12
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 2 on 128 virtual cores

Baseline

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-132-generic #148-Ubuntu SMP Mon Oct 17 16:02:06 UTC 2022 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2172 MHz     891390 s       1284 s      35311 s  133332872 s          0 s
  Memory: 251.6267318725586 GB (241550.40625 MB free)
  Uptime: 104901.23 sec
  Load Avg:  1.2  1.2  1.12
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 2 on 128 virtual cores

There seems to be a regression of tree_3d_dgsem/elixir_euler_mortar.jl...

@ranocha
Copy link
Member Author

ranocha commented Dec 2, 2022

@sloede It would be great if you could also perform some benchmarks.

@sloede
Copy link
Member

sloede commented Dec 3, 2022

@sloede It would be great if you could also perform some benchmarks.

Sure. Anything in particular you think would be useful?

@ranocha
Copy link
Member Author

ranocha commented Dec 3, 2022

I guess your "standard benchmark setups" you have used elsewhere are already a good start. In addition, I would like to check

  • that it has no impact on runtime performance
  • that it gives reasonable latency improvements for 2D and 3D compressible Euler
  • that it has no net negative effects on other equations
  • whether the increased precompiled times are okay with you
  • all this also on other systems besides the ones I have access to

@ranocha
Copy link
Member Author

ranocha commented Dec 6, 2022

Results from Rocinante (with up-to-date packages) with timings in seconds obtained from

julia --project=. --threads=1 -e '
  @time @eval(using OrdinaryDiffEq);
  @time @eval(using Trixi);
  @time @eval trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_taylor_green_vortex.jl"), maxiters=200)'
julia --project=. --threads=1 -e '
  @time @eval(using OrdinaryDiffEq);
  @time @eval(using Trixi);
  @time @eval trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_mortar.jl"), maxiters=200)'
julia --project=. --threads=1 -e '
  @time @eval(using OrdinaryDiffEq);
  @time @eval(using Trixi);
  @time @eval trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_mhd_alfven_wave.jl"), maxiters=200)'

(and faking a modification to Trixi to trigger precompilation in a separate Julia session, recording the time reported for ]precompile). main is at 0a24621b74b39438d5f37abab44dc2855fb9fa68.

Julia v1.8.3

main this PR difference
precompiling Trixi ca. 30 ca. 54 ca. +24
using OrdinaryDiffEq ca. 7.5 ca. 7.5 +/- 0
using Trixi ca. 18 ca. 21.5 ca. +3.5
trixi_include(...elixir_euler_taylor_green_vortex...) ca. 43 ca. 27.5 ca. -15.5
trixi_include(...elixir_euler_mortar...) ca. 40 ca. 21 ca. -19
trixi_include(...elixir_mhd_alfven_wave...) ca. 44.5 ca. 40 ca. -4.5

The gain for elixir_euler_mortar is bigger since it uses the weak form volume terms we also use for precompilation here (since they are cheaper and lead to less precompilation time).

main:

julia> using OrdinaryDiffEq, Trixi

julia> trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_mortar.jl"), maxiters=200) # 2nd time
[...]
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              465ms /  97.2%           3.34MiB /  90.3%

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         636    437ms   96.6%   687μs    248KiB    8.0%     399B
   interface flux             636    104ms   23.1%   164μs     0.00B    0.0%    0.00B
   volume integral            636   82.7ms   18.3%   130μs     0.00B    0.0%    0.00B
   prolong2interfaces         636   76.3ms   16.9%   120μs     0.00B    0.0%    0.00B
   source terms               636   75.1ms   16.6%   118μs    238KiB    7.7%     384B
   surface integral           636   38.2ms    8.4%  60.0μs     0.00B    0.0%    0.00B
   mortar flux                636   32.1ms    7.1%  50.5μs     0.00B    0.0%    0.00B
   prolong2mortars            636   22.2ms    4.9%  34.9μs     0.00B    0.0%    0.00B
   reset ∂u/∂t                636   2.56ms    0.6%  4.03μs     0.00B    0.0%    0.00B
   Jacobian                   636   2.46ms    0.5%  3.87μs     0.00B    0.0%    0.00B
   ~rhs!~                     636    692μs    0.2%  1.09μs   9.33KiB    0.3%    15.0B
   prolong2boundaries         636   34.3μs    0.0%  53.9ns     0.00B    0.0%    0.00B
   boundary flux              636   13.8μs    0.0%  21.7ns     0.00B    0.0%    0.00B
 calculate dt                 128   5.87ms    1.3%  45.9μs     0.00B    0.0%    0.00B
 analyze solution               3   5.29ms    1.2%  1.76ms   53.2KiB    1.7%  17.7KiB
 I/O                            4   4.36ms    1.0%  1.09ms   2.72MiB   90.2%   696KiB
   save solution                3   3.44ms    0.8%  1.15ms   2.69MiB   89.2%   918KiB
   ~I/O~                        4    914μs    0.2%   228μs   24.2KiB    0.8%  6.06KiB
   get element variables        3   10.7μs    0.0%  3.57μs   6.42KiB    0.2%  2.14KiB
   save mesh                    3    120ns    0.0%  40.0ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

This PR:

julia> using OrdinaryDiffEq, Trixi

julia> trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_mortar.jl"), maxiters=200) # 2nd time
[...]
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              459ms /  97.2%           4.17MiB /  92.3%

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         636    431ms   96.5%   678μs   1.08MiB   28.0%  1.73KiB
   interface flux             636    103ms   23.1%   162μs     0.00B    0.0%    0.00B
   volume integral            636   81.8ms   18.3%   129μs    229KiB    5.8%     368B
   prolong2interfaces         636   75.4ms   16.9%   118μs     0.00B    0.0%    0.00B
   source terms               636   72.3ms   16.2%   114μs    238KiB    6.0%     384B
   surface integral           636   38.3ms    8.6%  60.2μs    229KiB    5.8%     368B
   mortar flux                636   31.8ms    7.1%  50.1μs     0.00B    0.0%    0.00B
   prolong2mortars            636   22.1ms    5.0%  34.8μs     0.00B    0.0%    0.00B
   reset ∂u/∂t                636   2.61ms    0.6%  4.11μs    189KiB    4.8%     304B
   Jacobian                   636   2.50ms    0.6%  3.93μs    209KiB    5.3%     336B
   ~rhs!~                     636    717μs    0.2%  1.13μs   9.33KiB    0.2%    15.0B
   prolong2boundaries         636   31.6μs    0.0%  49.7ns     0.00B    0.0%    0.00B
   boundary flux              636   14.0μs    0.0%  22.0ns     0.00B    0.0%    0.00B
 calculate dt                 128   5.83ms    1.3%  45.5μs     0.00B    0.0%    0.00B
 analyze solution               3   5.28ms    1.2%  1.76ms   57.2KiB    1.5%  19.1KiB
 I/O                            4   4.39ms    1.0%  1.10ms   2.72MiB   70.6%   696KiB
   save solution                3   3.50ms    0.8%  1.17ms   2.69MiB   69.8%   918KiB
   ~I/O~                        4    877μs    0.2%   219μs   24.2KiB    0.6%  6.06KiB
   get element variables        3   12.9μs    0.0%  4.29μs   6.42KiB    0.2%  2.14KiB
   save mesh                    3    130ns    0.0%  43.3ns     0.00B    0.0%    0.00B

Julia 1.10.0-DEV.98 (2022-12-01), vc/external_functions/c244b01f148 from JuliaLang/julia#47184

main this PR difference
precompiling Trixi ca. 32 ca. 54 ca. +22
using OrdinaryDiffEq ca. 5 ca. 5 +/- 0
using Trixi ca. 11 ca. 12 ca. +1
trixi_include(...elixir_euler_taylor_green_vortex...) ca. 32 ca. 22 ca. -10
trixi_include(...elixir_euler_mortar...) ca. 29 ca. 20.5 ca. -8.5
trixi_include(...elixir_mhd_alfven_wave...) ca. 33 ca. 28.5 ca. -4.5

main:

julia> using OrdinaryDiffEq, Trixi

julia> trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_mortar.jl"), maxiters=200) # 2nd time
[...]
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              491ms /  97.3%           3.33MiB /  90.3%

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         636    462ms   96.5%   726μs    248KiB    8.0%     399B
   interface flux             636    106ms   22.2%   167μs     0.00B    0.0%    0.00B
   source terms               636   92.9ms   19.4%   146μs    238KiB    7.7%     384B
   volume integral            636   87.5ms   18.3%   138μs     0.00B    0.0%    0.00B
   prolong2interfaces         636   80.6ms   16.8%   127μs     0.00B    0.0%    0.00B
   surface integral           636   36.9ms    7.7%  58.0μs     0.00B    0.0%    0.00B
   mortar flux                636   29.6ms    6.2%  46.6μs     0.00B    0.0%    0.00B
   prolong2mortars            636   22.1ms    4.6%  34.7μs     0.00B    0.0%    0.00B
   Jacobian                   636   2.50ms    0.5%  3.93μs     0.00B    0.0%    0.00B
   reset ∂u/∂t                636   2.48ms    0.5%  3.90μs     0.00B    0.0%    0.00B
   ~rhs!~                     636    726μs    0.2%  1.14μs   9.33KiB    0.3%    15.0B
   prolong2boundaries         636   32.7μs    0.0%  51.4ns     0.00B    0.0%    0.00B
   boundary flux              636   15.0μs    0.0%  23.6ns     0.00B    0.0%    0.00B
 analyze solution               3   6.44ms    1.3%  2.15ms   53.2KiB    1.7%  17.7KiB
 calculate dt                 128   5.88ms    1.2%  46.0μs     0.00B    0.0%    0.00B
 I/O                            4   4.50ms    0.9%  1.12ms   2.72MiB   90.2%   695KiB
   save solution                3   3.65ms    0.8%  1.22ms   2.69MiB   89.2%   917KiB
   ~I/O~                        4    828μs    0.2%   207μs   24.1KiB    0.8%  6.02KiB
   get element variables        3   23.7μs    0.0%  7.90μs   7.12KiB    0.2%  2.38KiB
   save mesh                    3    200ns    0.0%  66.7ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

This PR:

julia> using OrdinaryDiffEq, Trixi

julia> trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_mortar.jl"), maxiters=200) # 2nd time
[...]
 ────────────────────────────────────────────────────────────────────────────────────
              Trixi.jl                      Time                    Allocations
                                   ───────────────────────   ────────────────────────
         Tot / % measured:              5.89s /  99.7%           1.03GiB / 100.0%

 Section                   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         636    5.83s   99.3%  9.17ms   1.02GiB   99.3%  1.64MiB
   prolong2mortars            636    2.73s   46.4%  4.29ms    523MiB   49.6%   842KiB
   mortar flux                636    2.71s   46.1%  4.26ms    523MiB   49.6%   842KiB
   interface flux             636    107ms    1.8%   168μs     0.00B    0.0%    0.00B
   volume integral            636   88.5ms    1.5%   139μs     0.00B    0.0%    0.00B
   prolong2interfaces         636   80.5ms    1.4%   127μs     0.00B    0.0%    0.00B
   source terms               636   74.0ms    1.3%   116μs    238KiB    0.0%     384B
   surface integral           636   39.9ms    0.7%  62.7μs     0.00B    0.0%    0.00B
   reset ∂u/∂t                636   2.51ms    0.0%  3.95μs     0.00B    0.0%    0.00B
   Jacobian                   636   2.50ms    0.0%  3.94μs     0.00B    0.0%    0.00B
   ~rhs!~                     636   1.31ms    0.0%  2.06μs   9.33KiB    0.0%    15.0B
   prolong2boundaries         636   49.3μs    0.0%  77.6ns     0.00B    0.0%    0.00B
   boundary flux              636   14.2μs    0.0%  22.3ns     0.00B    0.0%    0.00B
 analyze solution               3   32.0ms    0.5%  10.7ms   4.98MiB    0.5%  1.66MiB
 calculate dt                 128   5.92ms    0.1%  46.2μs     0.00B    0.0%    0.00B
 I/O                            4   5.04ms    0.1%  1.26ms   2.72MiB    0.3%   695KiB
   save solution                3   4.16ms    0.1%  1.39ms   2.69MiB    0.3%   917KiB
   ~I/O~                        4    854μs    0.0%   213μs   24.0KiB    0.0%  6.00KiB
   get element variables        3   30.8μs    0.0%  10.3μs   7.12KiB    0.0%  2.38KiB
   save mesh                    3    300ns    0.0%   100ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────

So there seems to be a type-instability on this version of Julia with more precompilation for the mortar stuff. I think it may be related to JuliaLang/julia#35800, JuliaLang/julia#32552, JuliaLang/julia#41740. It looks like tho workaround

Trixi.jl/src/Trixi.jl

Lines 264 to 282 in 0a24621

# FIXME upstream. This is a hacky workaround for
# https://github.com/trixi-framework/Trixi.jl/issues/628
# https://github.com/trixi-framework/Trixi.jl/issues/1185
# The related upstream issues appear to be
# https://github.com/JuliaLang/julia/issues/35800
# https://github.com/JuliaLang/julia/issues/32552
# https://github.com/JuliaLang/julia/issues/41740
# See also https://discourse.julialang.org/t/performance-depends-dramatically-on-compilation-order/58425
let
for T in (Float32, Float64)
u_mortars_2d = zeros(T, 2, 2, 2, 2, 2)
u_view_2d = view(u_mortars_2d, 1, :, 1, :, 1)
LoopVectorization.axes(u_view_2d)
u_mortars_3d = zeros(T, 2, 2, 2, 2, 2, 2)
u_view_3d = view(u_mortars_3d, 1, :, 1, :, :, 1)
LoopVectorization.axes(u_view_3d)
end
end

does not work anymore with this Julia update and increased precompilation from this PR...

  • Is __init__ called before precompiling stuff? If not, do we get different results if we also include this workaround in precompilation?

@ranocha
Copy link
Member Author

ranocha commented Dec 6, 2022

@sloede I added a TODO note above. I will not change anything here before we get your results (to make them comparable). Afterwards, I will try the idea outlined above to see whether this fixes the problem with the upcoming Julia update and increased precompilation from this PR.

@ranocha
Copy link
Member Author

ranocha commented Dec 7, 2022

Let's do it like this: We can decide how to proceed when we have your results. If you want to merge this, I will try to fix the type stability regression. Otherwise, we I will not touch this anymore but leave it open if we ever decide to change something in the future.

@sloede
Copy link
Member

sloede commented Dec 7, 2022

On a Macbook M1:

Julia v1.8.3

main this PR difference
precompiling Trixi ca. 23 ca. 49 ca. +26
using OrdinaryDiffEq ca. 5.4 ca. 5.3 +/- 0
using Trixi ca. 12.1 ca. 14.6 ca. +3.5
trixi_include(...elixir_euler_taylor_green_vortex...) ca. 32.9 ca. 21.6 ca. -8.7
trixi_include(...elixir_euler_mortar...) ca. 31.1 ca. 17.5 ca. -13.6
trixi_include(...elixir_mhd_alfven_wave...) ca. 33.1 ca. 29.2 ca. -2.9

Julia 1.10.0-DEV.98 (2022-12-01), vc/external_functions/c244b01f148 from JuliaLang/julia#47184

No numbers because of Apple's horrific view of its developer/users that does not allow to run unsigned/self-signed binaries from the internet 🤬:

dyld[88240]: Library not loaded: '@rpath/libjulia.dylib'
  Referenced from: '/Users/hpcschlo/.pool/julia-c244b01f14/bin/julia'
  Reason: tried: '/Users/hpcschlo/.pool/julia-c244b01f14/bin/../lib/libjulia.dylib' (code signature in <A2C9239D-67DE-3407-B166-3C7C33AA36DF> '/Users/hpcschlo/.pool/julia-c244b01f14/lib/libjulia.1.10.dylib' not valid for use in process: mapped file has no Team ID and is not a platform binary (signed with custom identity or adhoc?)), '/Users/hpcschlo/.pool/julia-c244b01f14/bin/../lib/libjulia.dylib' (code signature in <A2C9239D-67DE-3407-B166-3C7C33AA36DF> '/Users/hpcschlo/.pool/julia-c244b01f14/lib/libjulia.1.10.dylib' not valid for use in process: mapped file has no Team ID and is not a platform binary (signed with custom identity or adhoc?)), '/usr/lib/libjulia.dylib' (no such file)
Abort trap: 6

@ranocha
Copy link
Member Author

ranocha commented Dec 7, 2022

Thanks, @sloede! It does look like it's less interesting on your Mac system, so I will not work on this further at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants