Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression when using AMDGPU and CUDA in same session #47331

Closed
jpsamaroo opened this issue Oct 26, 2022 · 2 comments · Fixed by #47334
Closed

Regression when using AMDGPU and CUDA in same session #47331

jpsamaroo opened this issue Oct 26, 2022 · 2 comments · Fixed by #47334
Labels
gpu Affects running Julia on a GPU regression Regression in behavior compared to a previous version types and dispatch Types, subtyping and method dispatch
Milestone

Comments

@jpsamaroo
Copy link
Member

jpsamaroo commented Oct 26, 2022

When loading AMDGPU.jl and CUDA.jl (in that order) on Julia master breaks usage of AMDGPU. It appears that when AMDGPU calls into GPUCompiler's cached_compilation function, and various GPUCompiler interface functions are called (which have overloads defined in AMDGPU), those interface function calls fall back to their non-specialized base-case method defined in GPUCompiler. Specifically, the call:

GPUCompiler.runtime_module(job)

where job is a:

GPUCompiler.CompilerJob{GPUCompiler.GCNCompilerTarget, AMDGPU.Compiler.ROCCompilerParams}

does not call the method https://github.com/JuliaGPU/AMDGPU.jl/blob/bf66efcc5541319ae44d3ff45669abebb133d421/src/compiler.jl#L59, but instead calls https://github.com/JuliaGPU/GPUCompiler.jl/blob/18163a561a169934844d493e4fcd3238439c3965/src/interface.jl#L188.

I can use reflection within GPUCompiler to see that AMDGPU's method does in fact exist and matches the signature for runtime_module called with job.

If CUDA is not loaded, or before CUDA is loaded, the GPUCompiler overloads in AMDGPU are called correctly. Loading CUDA before AMDGPU also avoids the problem.

git bisect points to:

fbd5a72b0a13223c0f67b9d2baa4428e7db199a5 is the first bad commit
commit fbd5a72b0a13223c0f67b9d2baa4428e7db199a5
Author: Jameson Nash <vtjnash@gmail.com>
Date:   Wed Oct 5 12:00:56 2022 -0400

    precompile: serialize the full edges graph (#46920)

    Previously, we would flatten the edges graph during serialization, to
    simplify the deserialization codes, but that now was adding complexity
    and confusion and uncertainty to the code paths. Clean that all up, so
    that we do not do that. Validation is performed while they are represented
    as forward edges, so avoids needing to interact with backedges at all.

The commit immediately prior does not exhibit the problematic behavior.

Currently the failing AMDGPU code paths can't be exercised without access to a supported AMD GPU; I am going to implement a way to call the desired methods without hardware access, as well as an MWE, to make it easier to reproduce.

@vtjnash

Ref #46920

@jpsamaroo jpsamaroo added regression Regression in behavior compared to a previous version types and dispatch Types, subtyping and method dispatch gpu Affects running Julia on a GPU labels Oct 26, 2022
@jpsamaroo jpsamaroo added this to the 1.9 milestone Oct 26, 2022
@PallHaraldsson
Copy link
Contributor

Intriguing, isn't CUDA.jl Nvidia only? I.e. while you should be able to load both, wouldn't that in practice only happen if you have Nvidia and AMD, very rare (I've never heard of...)?

@luraess
Copy link

luraess commented Oct 26, 2022

@PallHaraldsson one current use-case of the simultaneous load of AMDGPU and CUDA are packages like ParallelStencil and ImplicitGlobalGrid which implement backends both for AMD and Nvidia hardware. Lazy loading enables concise code design, but was currently broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu Affects running Julia on a GPU regression Regression in behavior compared to a previous version types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants