Use recursion to fix inference failure #78

charleskawczynski · 2024-03-02T17:23:17Z

This PR inlines, and uses recursion to fix cases where Adapt.adapt performs runtime dispatch due to either recursion limits, or inability to unroll map on Tuples. I have and will post a reproducer shortly.

Closes #79.

charleskawczynski · 2024-03-03T03:59:01Z

I'm verifying (here) that this branch is solely responsible, but this seems to reduce gpu allocations for us by a factor of 15:
From build https://buildkite.com/clima/climaatmos-ci/builds/17118#018e020d-d9a0-4f26-b8c7-04013be0b169

[ Info: `allocs (flame_gpu_implicit_barowave_moist)`: 3637320

to build https://buildkite.com/clima/climaatmos-ci/builds/17105#018e004b-661b-404c-96e9-fafa28d875ed:

[ Info: `allocs (flame_gpu_implicit_barowave_moist)`: 241464

This basically fixes runtime dispatch issues that we're seeing (mentioned here, but in a different context).

charleskawczynski · 2024-03-03T04:01:41Z

Here is a reproducer:

git clone https://github.com/CliMA/ClimaCore.jl
cd ClimaCore.jl/
julia --project=examples

import ClimaCore;
import ClimaComms;
using Test
import ClimaCore.Fields
import ClimaCore.Operators
import ClimaCore.Spaces
import ClimaCore.Geometry
using BenchmarkTools;
@isdefined(TU) || include(joinpath(pkgdir(ClimaCore), "test", "TestUtilities", "TestUtilities.jl"));
import .TestUtilities as TU;
FT = Float64;
zelem=25
cspace = TU.CenterExtrudedFiniteDifferenceSpace(FT;zelem, helem=10);
fspace = Spaces.FaceExtrudedFiniteDifferenceSpace(cspace);
const CT3 = Geometry.Contravariant3Vector
const C12 = ClimaCore.Geometry.Covariant12Vector
@show ClimaComms.device(cspace)
ᶜx = fill((;uₕ=zero(C12{FT}) ,ρ=FT(0)), cspace);
ᶠx = fill((;ᶠuₕ³=zero(CT3{FT})), fspace);

const ᶠwinterp = Operators.WeightedInterpolateC2F(
    bottom = Operators.Extrapolate(),
    top = Operators.Extrapolate(),
)

function set_ᶠuₕ³!(ᶜx, ᶠx)
    ᶜJ = Fields.local_geometry_field(ᶜx).J
    @. ᶠx.ᶠuₕ³ = ᶠwinterp(ᶜx.ρ * ᶜJ, CT3(ᶜx.uₕ))
    return nothing
end

set_ᶠuₕ³!(ᶜx, ᶠx) # compile
p_allocated = @allocated set_ᶠuₕ³!(ᶜx, ᶠx)
@show p_allocated

using CUDA
using JET
using Adapt
# @test_opt ignored_modules = (CUDA,) set_ᶠuₕ³!(ᶜx, ᶠx)
import Base.Broadcast: broadcasted
ᶜJ = Fields.local_geometry_field(ᶜx).J
bc = broadcasted(ᶠwinterp, broadcasted(*, ᶜx.ρ, ᶜJ), broadcasted(CT3, ᶜx.uₕ));
Adapt.adapt(CUDA.KernelAdaptor(), bc.args);
# Call chain:
@test_opt ignored_modules = (CUDA,) Adapt.adapt(CUDA.KernelAdaptor(), bc.args);
@test_opt ignored_modules = (CUDA,) Adapt.adapt_structure(CUDA.KernelAdaptor(), bc.args);
@test_opt map(Adapt.adapt(CUDA.KernelAdaptor()), bc.args);

. You may need to have JET in your local environment for this reproducer to work. The JET tests fail on main, and pass in this branch.

Any chance we can get this merged? cc @maleadt

charleskawczynski · 2024-03-04T15:10:44Z

Ok, an update on this: this PR is about half responsible for the allocations mentioned above. So, I'd really appreciate if we can merge this.

maleadt · 2024-03-05T10:21:36Z

Bluntly applying @inline everywhere is not particularly satisfying; most of the time the Julia's inliner is smart enough to do the right thing. Do you have an idea which methods specifically are problematic?

charleskawczynski · 2024-03-05T12:17:16Z

Bluntly applying @inline everywhere is not particularly satisfying; most of the time the Julia's inliner is smart enough to do the right thing. Do you have an idea which methods specifically are problematic?

It’s the call to map that is problematic. Should I try to only apply it to that method?

maleadt · 2024-03-05T12:35:35Z

Should I try to only apply it to that method?

If that resolves the problem, I think that would be a cleaner solution. The other methods should normally all be inlined already.

charleskawczynski · 2024-03-05T14:04:57Z

Ok, I've updated the PR. It turns out that @inline is not even necessary. So, I've only replaced map with recursion, which still fixes the issue. I've also updated the reproducer, which I think should "work" better (I tried it just now and ran into version incompatibilities).

src/base.jl

Co-authored-by: Tim Besard <tim.besard@gmail.com>

charleskawczynski · 2024-03-05T14:49:09Z

I don't understand why, but it seems that the compiler is suddenly having issues with adapt_structure(to, xs::NamedTuple) = map(x->adapt(to,x), xs) once adapt_structure(to, xs::Tuple) = map(x->adapt(to,x), xs) is replaced with recursion. Changing adapt_structure(to, xs::NamedTuple) = map(x->adapt(to,x), xs) to

adapt_structure(to, xs::NamedTuple{names}) where {names} =
    NamedTuple{names}(adapt_structure(to, Tuple(xs)))

fixes things (and it still infers in my original example). I've pushed this fix. Is this alright @maleadt?

codecov · 2024-03-05T14:57:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.02%. Comparing base (83fc6c6) to head (9b91675).

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #78      +/-   ##
==========================================
+ Coverage   93.65%   94.02%   +0.37%     
==========================================
  Files           6        6              
  Lines          63       67       +4     
==========================================
+ Hits           59       63       +4     
  Misses          4        4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

maleadt · 2024-03-05T15:01:12Z

You used map(x->adapt(to,x), xs) while there used to be adapt(to) aka. Base.Fix1(adapt, to). Not sure why that makes a difference, but it fixes tests here.

charleskawczynski · 2024-03-05T15:11:18Z

~~Interesting! This begs the question: does adapt_structure(to, xs::Union{Tuple,NamedTuple}) = map(x->adapt(to, x), xs) work more generally, and fix my original issue? I'll try that now.~~ I misread that (I had it reversed). Ok, changes look good.

This reverts commit e1ef21b.

This was referenced Mar 2, 2024

Try adapt branch CliMA/ClimaAtmos.jl#2748

Closed

Try adapt main CliMA/ClimaAtmos.jl#2749

Closed

charleskawczynski marked this pull request as ready for review March 3, 2024 04:01

charleskawczynski mentioned this pull request Mar 4, 2024

Investigate causes of poor scaling in multi-GPU runs CliMA/ClimaAtmos.jl#2222

Open

Use recursion to fix inference failure

73d96dd

charleskawczynski force-pushed the ck/inline_recurse branch from a55877e to 73d96dd Compare March 5, 2024 14:02

charleskawczynski changed the title ~~Inline and use recursion~~ Use recursion to fix inference failure Mar 5, 2024

maleadt reviewed Mar 5, 2024

View reviewed changes

src/base.jl Outdated Show resolved Hide resolved

charleskawczynski and others added 2 commits March 5, 2024 06:17

Update src/base.jl

94f809a

Co-authored-by: Tim Besard <tim.besard@gmail.com>

Fix inference for adapt_structure with NamedTuple

682635f

Simplify.

9b91675

maleadt force-pushed the ck/inline_recurse branch from 190fc80 to 9b91675 Compare March 5, 2024 15:01

maleadt merged commit e1ef21b into JuliaGPU:master Mar 5, 2024
18 checks passed

maleadt added a commit that referenced this pull request Mar 8, 2024

Revert "Use recursion to fix inference failure with tuples (#78)"

79a98f2

This reverts commit e1ef21b.

maleadt mentioned this pull request Mar 8, 2024

Revert "Use recursion to fix inference failure" #80

Merged

maleadt added a commit that referenced this pull request Mar 8, 2024

Revert "Use recursion to fix inference failure with tuples (#78)" (#80)

e99bc55

This reverts commit e1ef21b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use recursion to fix inference failure #78

Use recursion to fix inference failure #78

charleskawczynski commented Mar 2, 2024 •

edited

Loading

charleskawczynski commented Mar 3, 2024

charleskawczynski commented Mar 3, 2024 •

edited

Loading

charleskawczynski commented Mar 4, 2024

maleadt commented Mar 5, 2024

charleskawczynski commented Mar 5, 2024

maleadt commented Mar 5, 2024

charleskawczynski commented Mar 5, 2024

charleskawczynski commented Mar 5, 2024 •

edited

Loading

codecov bot commented Mar 5, 2024 •

edited

Loading

maleadt commented Mar 5, 2024

charleskawczynski commented Mar 5, 2024 •

edited

Loading

Use recursion to fix inference failure #78

Use recursion to fix inference failure #78

Conversation

charleskawczynski commented Mar 2, 2024 • edited Loading

charleskawczynski commented Mar 3, 2024

charleskawczynski commented Mar 3, 2024 • edited Loading

charleskawczynski commented Mar 4, 2024

maleadt commented Mar 5, 2024

charleskawczynski commented Mar 5, 2024

maleadt commented Mar 5, 2024

charleskawczynski commented Mar 5, 2024

charleskawczynski commented Mar 5, 2024 • edited Loading

codecov bot commented Mar 5, 2024 • edited Loading

Codecov Report

maleadt commented Mar 5, 2024

charleskawczynski commented Mar 5, 2024 • edited Loading

charleskawczynski commented Mar 2, 2024 •

edited

Loading

charleskawczynski commented Mar 3, 2024 •

edited

Loading

charleskawczynski commented Mar 5, 2024 •

edited

Loading

codecov bot commented Mar 5, 2024 •

edited

Loading

charleskawczynski commented Mar 5, 2024 •

edited

Loading