Regression when using AMDGPU and CUDA in same session #47331

jpsamaroo · 2022-10-26T13:50:51Z

When loading AMDGPU.jl and CUDA.jl (in that order) on Julia master breaks usage of AMDGPU. It appears that when AMDGPU calls into GPUCompiler's cached_compilation function, and various GPUCompiler interface functions are called (which have overloads defined in AMDGPU), those interface function calls fall back to their non-specialized base-case method defined in GPUCompiler. Specifically, the call:

GPUCompiler.runtime_module(job)

where job is a:

GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.ROCCompilerParams}

does not call the method https://github.com/JuliaGPU/AMDGPU.jl/blob/bf66efcc5541319ae44d3ff45669abebb133d421/src/compiler.jl#L59, but instead calls https://github.com/JuliaGPU/GPUCompiler.jl/blob/18163a561a169934844d493e4fcd3238439c3965/src/interface.jl#L188.

I can use reflection within GPUCompiler to see that AMDGPU's method does in fact exist and matches the signature for runtime_module called with job.

If CUDA is not loaded, or before CUDA is loaded, the GPUCompiler overloads in AMDGPU are called correctly. Loading CUDA before AMDGPU also avoids the problem.

git bisect points to:

fbd5a72b0a13223c0f67b9d2baa4428e7db199a5 is the first bad commit
commit fbd5a72b0a13223c0f67b9d2baa4428e7db199a5
Author: Jameson Nash <vtjnash@gmail.com>
Date:   Wed Oct 5 12:00:56 2022 -0400

    precompile: serialize the full edges graph (#46920)

    Previously, we would flatten the edges graph during serialization, to
    simplify the deserialization codes, but that now was adding complexity
    and confusion and uncertainty to the code paths. Clean that all up, so
    that we do not do that. Validation is performed while they are represented
    as forward edges, so avoids needing to interact with backedges at all.

The commit immediately prior does not exhibit the problematic behavior.

Currently the failing AMDGPU code paths can't be exercised without access to a supported AMD GPU; I am going to implement a way to call the desired methods without hardware access, as well as an MWE, to make it easier to reproduce.

@vtjnash

Ref #46920

The text was updated successfully, but these errors were encountered:

PallHaraldsson · 2022-10-26T14:59:02Z

Intriguing, isn't CUDA.jl Nvidia only? I.e. while you should be able to load both, wouldn't that in practice only happen if you have Nvidia and AMD, very rare (I've never heard of...)?

luraess · 2022-10-26T21:04:10Z

@PallHaraldsson one current use-case of the simultaneous load of AMDGPU and CUDA are packages like ParallelStencil and ImplicitGlobalGrid which implement backends both for AMD and Nvidia hardware. Lazy loading enables concise code design, but was currently broken.

jpsamaroo added regression Regression in behavior compared to a previous version types and dispatch Types, subtyping and method dispatch gpu Affects running Julia on a GPU labels Oct 26, 2022

jpsamaroo added this to the 1.9 milestone Oct 26, 2022

jpsamaroo mentioned this issue Oct 26, 2022

ROC kernel faulting upon having AMDGPU and CUDA loaded JuliaGPU/GPUCompiler.jl#362

Closed

jpsamaroo mentioned this issue Oct 26, 2022

Fix loading of AMDGPU and CUDA #47334

Merged

giordano closed this as completed in #47334 Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression when using AMDGPU and CUDA in same session #47331

Regression when using AMDGPU and CUDA in same session #47331

jpsamaroo commented Oct 26, 2022 •

edited by maleadt

Loading

PallHaraldsson commented Oct 26, 2022

luraess commented Oct 26, 2022

Regression when using AMDGPU and CUDA in same session #47331

Regression when using AMDGPU and CUDA in same session #47331

Comments

jpsamaroo commented Oct 26, 2022 • edited by maleadt Loading

PallHaraldsson commented Oct 26, 2022

luraess commented Oct 26, 2022

jpsamaroo commented Oct 26, 2022 •

edited by maleadt

Loading