-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complex AbstractOperations cannot be computed on GPU #1241
Comments
Ah, you may be running into some hidden limitations of AbstractOperations --- they seem to fail when the operations are too complex. It's a shifty problem, because it appears to depend on the julia compiler. It's good that you opened this issue because it would be nice to document our efforts for solving this tricky problem. I don't think this is a problem of Over at LESbrary, we are circumventing this issue by hand-writing particularly important complicated kernels. I think https://github.com/CliMA/LESbrary.jl/blob/master/src/TurbulenceStatistics/viscous_dissipation.jl |
Thanks! Yeah, that's what I was trying to do. I see you already have some SGS stuff there as well. Do you have an example of how to use LESbrary? I'm not that familiar with Julia and just from reading the Thanks! |
There's some LESbrary examples but the package definitely needs some love (wink wink): To get using Pkg
Pkg.add("https://github.com/CliMA/LESbrary.jl.git")
using LESbrary.TurbulenceStatistics: ViscousDissipation |
One strategy that might help with complex |
This is probably related to some of the struggles in #870 |
I've tried a version of that already and it didn't work, but there are probably more things I could try. For now it's best not to reinvent the wheel and use LESbrary! But I'll keep this in mind when doing stuff like this in the future. |
Ultimately @tomchor we would definitely rather use |
Another issue is that I believe one cannot embed It might be good to come up with a list of |
I have not used GPUs and don't appreciate the difficulty here at all but would be happy to discuss this sometime if people wanted to have a brainstorming session. Certainly starting simple is what I would recommend. |
Yeah I think it's just an unknown GPU compilation problem/failure. Unclear to me whether it's the fault of
Maybe it makes sense to try and condense some abstract operations into an isolated minimal working example which we can use to open an issue on CUDA.jl if that turns out to be the problem? Might help isolate the problem, and would certainly be much easier to debug. I recall we encounter a limitation of the GPU compiler when trying to construct a GPU model with too many arbitrary tracers. I think in this case the type information was too large to even fit into the argument of a CUDA kernel or something so maybe this was a hard GPU limitation? Would be unfortunate if something similar is happening for complex abstract operations. |
Here's a starting list:
@ali-ramadhan not sure if this is what you mean but when trying to adapt |
Here's the specific error we got when we tried to get
dredged up from #746 . Some workarounds were suggested there, but I think our solution is actually better / simpler (adapt fields by unwrapping the underlying data and throwing away boundary conditions, rather than wrestling to get all the field info onto the poor GPU). |
Does someone have a minimal example that reproduces this error? I'm just curious to learn more about the problem. |
using Oceananigans
using Oceananigans.AbstractOperations
using Oceananigans.Fields
grid = RegularCartesianGrid(size=(1, 1, 1), extent=(1, 1, 1))
model = IncompressibleModel(architecture=GPU(), grid=grid)
u, v, w = model.velocities
Σˣˣ = ∂x(u)
Σʸʸ = ∂y(v)
Σᶻᶻ = ∂z(w)
Σˣʸ = (∂y(u) + ∂x(v)) / 2
Σˣᶻ = (∂z(u) + ∂x(w)) / 2
Σʸᶻ = (∂z(v) + ∂y(w)) / 2
ϵ = model.closure.ν * 2 * (Σˣˣ^2 + Σʸʸ^2 + Σᶻᶻ^2 + 2 * (Σˣʸ^2 + Σˣᶻ^2 + Σʸᶻ^2))
ϵ_field = ComputedField(ϵ)
compute!(ϵ_field) produces
|
Hmmm, looks like a Cassette issue. Maybe it's just being overly sensitive to the contents of the kernel (or kernel arguments) like was the case with #828? |
A "dynamic function invocation" means that the compiler thinks a function is being called whose scope can change "dynamically" (I think). This is the error one gets when a function depends on a global variable that is not
are not correctly inferred. The way getindex comes into play is in the kernel function Oceananigans.jl/src/Fields/computed_field.jl Lines 84 to 87 in c3b688f
calling Perhaps there are tricks we might use to help the compiler parse this kind of operation, like putting some type annotations / hints into |
Hmmm, if it's indeed incapable of compiling the entire operation might be helpful to be able to find out exactly at which size it fails. I'm still thinking that it might be something to be fixed/improved in CUDA.jl. I can try to do some test as I update #870. |
I tried the code above and do get the same error, so I can confirm that. I tried to trim it down and found that the following, slightly more minimal example, produces the same error. It seems that squaring and multiplying togther is too much. But if you remove one or the other, it seems to work fine.
|
Hmm that's good to know the limit of complexity. I can confirm that behavior, eg: julia> compute!(ComputedField(2 * Σˣˣ))
julia> compute!(ComputedField(Σˣˣ^2))
julia> compute!(ComputedField(2 * Σˣˣ^2))
ERROR: InvalidIRError: compiling kernel gpu__compute!... Here's the tree for julia> 2 * Σˣˣ^2
BinaryOperation at (Cell, Cell, Cell)
├── grid: RegularCartesianGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│ └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree:
* at (Cell, Cell, Cell) via identity
├── 2
└── ^ at (Cell, Cell, Cell) via identity
├── ∂xᶜᵃᵃ at (Cell, Cell, Cell) via identity
│ └── Field located at (Face, Cell, Cell)
└── 2 |
Also julia> compute!(ComputedField(Σˣˣ^2 + Σʸʸ^2))
ERROR: InvalidIRError: compiling kernel gpu__compute!... |
It's not strictly a nesting issue, since: julia> compute!(ComputedField(∂x(∂x(u))))
julia> compute!(ComputedField(∂x(∂x(∂x(u)))))
julia> compute!(ComputedField(∂x(∂x(∂x(∂x(u)))))) is fine. |
Whoa. Here's a hint. julia> compute!(ComputedField(u^2 + v^2 + w^2))
julia> compute!(ComputedField(u^2 + v^2))
ERROR: InvalidIRError: compiling kernel gpu__compute! !! Note: julia> u^2 + v^2
BinaryOperation at (Face, Cell, Cell)
├── grid: RegularCartesianGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│ └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree:
+ at (Face, Cell, Cell) via identity
├── ^ at (Face, Cell, Cell) via identity
│ ├── Field located at (Face, Cell, Cell)
│ └── 2
└── ^ at (Cell, Face, Cell) via identity
├── Field located at (Cell, Face, Cell)
└── 2
julia> u^2 + v^2 + w^2
MultiaryOperation at (Face, Cell, Cell)
├── grid: RegularCartesianGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│ └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree:
+ at (Face, Cell, Cell)
├── ^ at (Face, Cell, Cell) via identity
│ ├── Field located at (Face, Cell, Cell)
│ └── 2
├── ^ at (Cell, Face, Cell) via identity
│ ├── Field located at (Cell, Face, Cell)
│ └── 2
└── ^ at (Cell, Cell, Face) via identity
├── Field located at (Cell, Cell, Face)
└── 2 Surprisingly, MultiaryOperations are better behaved. |
I can confirm that I was able to get away with |
While Oceananigans.jl/src/AbstractOperations/multiary_operations.jl Lines 14 to 15 in c3b688f
the MultiaryOperation object is simpler than a BinaryOperation because it stores fewer interpolation functions (we only support the case that every member of a multiary operation is first interpolated to a common location before being op'd on): Oceananigans.jl/src/AbstractOperations/multiary_operations.jl Lines 3 to 12 in c3b688f
compare with Oceananigans.jl/src/AbstractOperations/binary_operations.jl Lines 8 to 32 in c3b688f
|
On problem 1: meeting with @vchuravy we think there is a "recursive call cycle with levels of indirection". In other words, calling
calls
which calls Oceananigans.jl/src/AbstractOperations/binary_operations.jl Lines 59 to 60 in c3b688f
which may invoke another call to either A possible solution, which is also a hilarious hack, is to define multiple |
Alternatively we could think about linearizing the recursion before we go off and compile it. |
I'm not sure if it matters too much, but I'm going over some examples posted here and I can't reproduce some of the examples by @glwagner. I can compute some stuff that he can't, while the opposite is true for others julia> compute!(ComputedField(∂x(∂x(u)))) # works as expected
julia> compute!(ComputedField(∂x(∂x(∂x(u))))) # works as expected
julia> compute!(ComputedField(∂x(∂x(∂x(∂x(u)))))) # doesn't work (not expected)
julia> compute!(ComputedField(u^2 + v^2)) # works (unexpected)
julia> compute!(ComputedField(u^2 + v^2 + w^2)) # works (expected) Hopefully that'll give you guys more hints on what's going on. I'm running this on a Tesla V100 at NCAR's Casper cluster btw.
EDIT: The results above were obtained initializing a minimum model as posted above. I tried the same examples I just posted initializing a model differently and many more things are failing now (for example, |
Can you please post the errors that you obtain in each case? |
Hmm, I can't reproduce the same results exactly. All I did before was honestly open a Julia session and just paste the examples you guys posted one by one. Here's a pastebin with my whole session testing the commands I got in the previous post. (The comments of course don't reflect the outcome anymore.) |
Awesome, thank you, that's helpful. Stochastic errors are troubling. |
I'm not sure if people are still thinking about this, but I may have some relevant information (Good news!) that I'd appreciate some feedback on. Consider the following MWE: using Oceananigans
using Oceananigans.Utils
using Oceananigans.Fields
Lx = 150; Ly = 6000; Lz = 80
topology = (Periodic, Bounded, Bounded)
grid = RegularRectilinearGrid(size=(1, 512, 8), x=(0, Lx), y=(0, Ly), z=(-Lz, 0),
topology=(Periodic, Bounded, Bounded))
model = IncompressibleModel(architecture = GPU(),
grid = grid,
)
w_ic(x, y, z) = 0.01*y
v_ic(x, y, z) = 0.01*x
set!(model, w=w_ic, v=v_ic)
import Oceananigans.Fields: ComputedField, KernelComputedField
using Oceananigans.AbstractOperations: @at, ∂x, ∂y, ∂z
using Oceananigans.Grids: Center, Face
u, v, w = model.velocities
function naive_calc()
p = sum(model.pressures)
wp = @at (Center, Center, Face) w*p
dwpdz = (1/1024) * ∂z(wp)
println(dwpdz)
return ComputedField(dwpdz)
end
function nested_calc()
p = ComputedField(sum(model.pressures))
wp = ComputedField(@at (Center, Center, Face) w*p)
dwpdz = (1/1024) * ∂z(wp)
println(dwpdz)
return ComputedField(dwpdz)
end I can include this script in the REPL after which I get the following results. First, when trying to compute the naive calculation using a GPU I get an error, which is expected at this point: julia> dwpdz_naive = naive_calc()
BinaryOperation at (Center, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Bounded, Bounded}(Nx=1, Ny=512, Nz=8)
│ └── domain: x ∈ [0.0, 150.0], y ∈ [0.0, 6000.0], z ∈ [-80.0, 0.0]
└── tree:
* at (Center, Center, Center) via identity
├── 0.0009765625
└── ∂zᵃᵃᶜ at (Center, Center, Center) via identity
└── * at (Center, Center, Face) via identity
├── Field located at (Center, Center, Face)
└── + at (Center, Center, Center) via identity
├── Field located at (Center, Center, Center)
└── Field located at (Center, Center, Center)
ComputedField located at (Center, Center, Center) of BinaryOperation at (Center, Center, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (3, 514, 10)
├── grid: RegularRectilinearGrid{Float64, Periodic, Bounded, Bounded}(Nx=1, Ny=512, Nz=8)
├── operand: BinaryOperation at (Center, Center, Center)
└── status: time=0.0
julia> compute!(dwpdz_naive)
ERROR: InvalidIRError: compiling kernel gpu__compute!(Cassette.Context{nametype(CUDACtx),KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.StaticSize{(1, 512, 8)},KernelAbstractions.NDIteration.DynamicCheck,Nothing,Nothing,KernelAbstractions.NDIteration.NDRange{3,KernelAbstractions.NDIteration.StaticSize{(1, 2, 8)},KernelAbstractions.NDIteration.StaticSize{(1, 256, 1)},Nothing,Nothing}},Nothing,KernelAbstractions.var"##PassType#253",Nothing,Cassette.DisableHooks}, typeof(Oceananigans.Fields.gpu__compute!), OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}}, Oceananigans.AbstractOperations.BinaryOperation{Center,Center,Center,typeof(*),Float64,Oceananigans.AbstractOperations.Derivative{Center,Center,Center,typeof(Oceananigans.Operators.∂zᵃᵃᶜ),Oceananigans.AbstractOperations.BinaryOperation{Center,Center,Face,typeof(*),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},Oceananigans.AbstractOperations.BinaryOperation{Center,Center,Center,typeof(+),OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},OffsetArrays.OffsetArray{Float64,3,CUDA.CuDeviceArray{Float64,3,1}},typeof(identity),typeof(identity),typeof(identity),RegularRectilinearGrid{Float64,Periodic,Bounded,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(Oceananigans.Operators.ℑzᵃᵃᶠ),typeof(identity),RegularRectilinearGrid{Float64,Periodic,Bounded,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),RegularRectilinearGrid{Float64,Periodic,Bounded,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}},typeof(identity),typeof(identity),typeof(identity),RegularRectilinearGrid{Float64,Periodic,Bounded,Bounded,OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}}) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to overdub(overdub_context::Cassette.Context, overdub_arguments...) in Cassette at /glade/u/home/tomasc/.julia/packages/Cassette/Wjztv/src/overdub.jl:595)
Stacktrace:
[1] getindex at /glade/u/home/tomasc/.julia/packages/Oceananigans/WSSHu/src/AbstractOperations/binary_operations.jl:34
[2] macro expansion at /glade/u/home/tomasc/.julia/packages/Oceananigans/WSSHu/src/Fields/computed_field.jl:114
[3] gpu__compute! at /glade/u/home/tomasc/.julia/packages/KernelAbstractions/mKsXc/src/macros.jl:80
[4] overdub at /glade/u/home/tomasc/.julia/packages/Cassette/Wjztv/src/overdub.jl:0
# I truncated the huge error message here However, the nested calculation appears to work!: julia> dwpdz_nested = nested_calc()
BinaryOperation at (Center, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Bounded, Bounded}(Nx=1, Ny=512, Nz=8)
│ └── domain: x ∈ [0.0, 150.0], y ∈ [0.0, 6000.0], z ∈ [-80.0, 0.0]
└── tree:
* at (Center, Center, Center) via identity
├── 0.0009765625
└── ∂zᵃᵃᶜ at (Center, Center, Center) via identity
└── ComputedField located at (Center, Center, Face) of BinaryOperation at (Center, Center, Face)
ComputedField located at (Center, Center, Center) of BinaryOperation at (Center, Center, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (3, 514, 10)
├── grid: RegularRectilinearGrid{Float64, Periodic, Bounded, Bounded}(Nx=1, Ny=512, Nz=8)
├── operand: BinaryOperation at (Center, Center, Center)
└── status: time=0.0
julia> compute!(dwpdz_nested)
julia> using Adapt
julia> adapt(Array, interior(dwpdz_nested))
1×512×8 view(OffsetArray(::Array{Float64,3}, 0:2, 0:513, 0:9), [1], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 503, 504, 505, 506, 507, 508, 509, 510, 511, 512], [1, 2, 3, 4, 5, 6, 7, 8]) with eltype Float64:
[:, :, 1] =
0.0262775 0.014948 0.00902569 0.00559831 0.00351427 0.00221792 0.00140318 0.000888719 … 0.000263472 0.000420913 0.000674128 0.00108572 0.00177041 0.00296775 0.00528696
[:, :, 2] =
0.0156731 0.0110235 0.00720451 0.00461103 0.00293129 0.00185902 0.00117792 0.00074609 … 0.000915116 0.00144732 0.00229007 0.00362638 0.0057484 0.00911035 0.0143023
[:, :, 3] =
0.00844536 0.00652392 0.00451369 0.00297425 0.00191676 0.00122263 0.000776137 0.000491594 … 0.00126331 0.00199147 0.00313408 0.00491472 0.00764863 0.0117137 0.0173591
[:, :, 4] =
0.00263363 0.0021138 0.00150181 0.00100316 0.000649705 0.000414395 0.000262323 0.00016544 … 0.00117766 0.00185426 0.00291374 0.00455993 0.00707827 0.0108175 0.0160812
[:, :, 5] =
-0.00276553 -0.00220933 -0.00157462 -0.00105926 -0.000692536 -0.000446557 -0.00028606 … 0.00062928 0.000993307 0.0015715 0.00249751 0.00400184 0.0064944 0.0106821
[:, :, 6] =
-0.00852329 -0.00657626 -0.00455259 -0.00300401 -0.00193944 -0.00123966 -0.000788713 … -0.000470821 -0.000722127 -0.00106354 -0.00141766 -0.00138651 0.000390487
[:, :, 7] =
-0.0156445 -0.0109965 -0.0071831 -0.00459454 -0.00291877 -0.00184967 -0.00117105 … -0.00143385 -0.00226136 -0.00355998 -0.00557919 -0.00863921 -0.0129097 -0.0170153
[:, :, 8] =
-0.0260963 -0.0148272 -0.00893539 -0.00552894 -0.00346128 -0.00217808 -0.00137374 … -0.00251345 -0.00397508 -0.00630142 -0.0100415 -0.0161907 -0.0268075 -0.0470868
(Btw, the example above obviously works fine with I even ran some other tests with even increased complexity. And they all appear to work on GPUs. For example this one: function crazy_calc()
p = ComputedField(sum(model.pressures))
wp = ComputedField(@at (Center, Center, Face) w*p)
dwpdz = (1/1024) * ∂z(wp)
println(dwpdz)
dwpdz = ComputedField(dwpdz)
dwpdz2 = ComputedField(dwpdz^2)
return ComputedField(dwpdz2+dwpdz)
end I'd appreciate if some of you could try to reproduce this result on other machines. I ran this in one of NCAR's Tesla V100s. If you can reproduce this behavior, then this kinda makes |
I think there's still a use for For some applications, the "optimization" of avoiding intermediate kernel launches / calculations may be unimportant (for example, if plenty of memory is available and computations are made very rarely). I think nesting There are also some other nice applications of I think it would be fun to try the compiler hack-around that I suggested in my comment above (defining multiple |
@glwagner Should we keep this open though, since we still haven't solved all the issues? (I know it's unlikely we'll ever solve all of them, but we might be able to make more progress). |
I'm fine with that. It makes sense if we think there's Oceananigans.jl development we might to do to solve the issue. I'm not entirely sure that's true right now. But here's where we stand on master currently: julia> using Oceananigans
[ Info: Precompiling Oceananigans [9e8cae18-63c1-5223-a75c-80ca9d6e9a09]
[ Info: Oceananigans will use 24 threads
julia> grid = RegularRectilinearGrid(size=(1, 1, 1), extent=(1, 1, 1)); model = IncompressibleModel(architecture=GPU(), grid=grid)
IncompressibleModel{GPU, Float64}(time = 0 seconds, iteration = 0)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
├── tracers: (:T, :S)
├── closure: IsotropicDiffusivity{Float64,NamedTuple{(:T, :S),Tuple{Float64,Float64}}}
├── buoyancy: SeawaterBuoyancy{Float64,LinearEquationOfState{Float64},Nothing,Nothing}
└── coriolis: Nothing
julia> u, v, w = model.velocities
(u = Field located at (Face, Center, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (3, 3, 3)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
└── boundary conditions: x=(west=Periodic, east=Periodic), y=(south=Periodic, north=Periodic), z=(bottom=ZeroFlux, top=ZeroFlux), v = Field located at (Center, Face, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (3, 3, 3)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
└── boundary conditions: x=(west=Periodic, east=Periodic), y=(south=Periodic, north=Periodic), z=(bottom=ZeroFlux, top=ZeroFlux), w = Field located at (Center, Center, Face)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (3, 3, 4)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
└── boundary conditions: x=(west=Periodic, east=Periodic), y=(south=Periodic, north=Periodic), z=(bottom=NormalFlow, top=NormalFlow))
julia> compute!(ComputedField(u + v - w)) # Now this works
julia> compute!(ComputedField(∂x(u)^2 + ∂y(v)^2 + ∂z(w)^2 + ∂x(w)^2)) # still doesnt work
ERROR: CUDA error: device kernel image is invalid (code 200, ERROR_INVALID_IMAGE)
julia> compute!(ComputedField(∂x(u)^2 + ∂y(v)^2 + ∂z(w)^2 + ∂x(w)^2 + ∂y(w)^2)) # still doesn't work
ERROR: CUDA error: a PTX JIT compilation failed (code 218, ERROR_INVALID_PTX)
ptxas application ptx input, line 802; error : Entry function '_Z19julia_gpu__compute_7ContextI14__CUDACtx_Name16CompilerMetadataI10StaticSizeI9_1__1__1_E12DynamicCheckvv7NDRangeILi3ES2_I9_1__1__1_ES2_I9_1__1__1_EvvEEv14__PassType_253v12DisableHooksE14_gpu__compute_11OffsetArrayI7Float64Li3E13CuDeviceArrayIS9_Li3ELi1EEE17MultiaryOperationI6CenterS12_S12_Li5E2__5TupleI15BinaryOperationIS12_S12_S12_S13_10DerivativeIS12_S12_S12_6__x___S8_IS9_Li3ES10_IS9_Li3ELi1EEE10_identity222RegularRectilinearGridIS9_8PeriodicS20_7BoundedS8_IS9_Li1E12StepRangeLenIS9_14TwicePrecisionIS9_ES23_IS9_EEEEE5Int6410_identity310_identity410_identity5S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES15_IS12_S12_S12_S13_S16_IS12_S12_S12_6__y___S8_IS9_Li3ES10_IS9_Li3ELi1EEE10_identity1S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES24_S18_S25_S26_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES15_IS12_S12_S12_S13_S16_IS12_S12_S12_6__z___S8_IS9_Li3ES10_IS9_Li3ELi1EEES27_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES24_S29_S18_S25_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES15_I4FaceS12_S31_S13_S16_IS31_S12_S31_S17_S8_IS9_Li3ES10_IS9_Li3ELi1EEES26_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES24_S27_S29_S18_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES15_IS12_S31_S31_S13_S16_IS12_S31_S31_S28_S8_IS9_Li3ES10_IS9_Li3ELi1EEES25_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEES24_S26_S27_S29_S19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEEES14_IS18_S25_S26_7__xz___7__yz___ES19_IS9_S20_S20_S21_S8_IS9_Li1ES22_IS9_S23_IS9_ES23_IS9_EEEEE' uses too much parameter space (0x1408 bytes, 0x1100 max).
ptxas fatal : Ptx assembly aborted due to errors We haven't discussed the problem with julia> U = AveragedField(u, dims=(1, 2))
AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Face, Center, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (1, 1, 3)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
├── dims: (1, 2)
├── operand: Field located at (Face, Center, Center)
└── status: time=0.0
julia> compute!(ComputedField(u - U))
julia> compute!(ComputedField((u - U)^2))
julia> V = AveragedField(v, dims=(1, 2))
AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Center, Face, Center)
├── data: OffsetArrays.OffsetArray{Float64,3,CUDA.CuArray{Float64,3}}, size: (1, 1, 3)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
├── dims: (1, 2)
├── operand: Field located at (Center, Face, Center)
└── status: time=0.0
julia> tke = 1/2 * ((u - U)^2 + (v - V)^2 + w^2)
BinaryOperation at (Face, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│ └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree:
* at (Face, Center, Center) via identity
├── 0.5
└── + at (Face, Center, Center)
├── ^ at (Face, Center, Center) via identity
│ ├── - at (Face, Center, Center) via identity
│ │ ├── Field located at (Face, Center, Center)
│ │ └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Face, Center, Center)
│ └── 2
├── ^ at (Center, Face, Center) via identity
│ ├── - at (Center, Face, Center) via identity
│ │ ├── Field located at (Center, Face, Center)
│ │ └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Center, Face, Center)
│ └── 2
└── ^ at (Center, Center, Face) via identity
├── Field located at (Center, Center, Face)
└── 2
julia> compute!(ComputedField(tke))
ERROR: InvalidIRError: compiling kernel gpu__compute! Interestingly, this works: julia> tke = ((u - U)^2 + (v - V)^2 + w^2) / 2
BinaryOperation at (Face, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│ └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree:
/ at (Face, Center, Center) via identity
├── + at (Face, Center, Center)
│ ├── ^ at (Face, Center, Center) via identity
│ │ ├── - at (Face, Center, Center) via identity
│ │ │ ├── Field located at (Face, Center, Center)
│ │ │ └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Face, Center, Center)
│ │ └── 2
│ ├── ^ at (Center, Face, Center) via identity
│ │ ├── - at (Center, Face, Center) via identity
│ │ │ ├── Field located at (Center, Face, Center)
│ │ │ └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Center, Face, Center)
│ │ └── 2
│ └── ^ at (Center, Center, Face) via identity
│ ├── Field located at (Center, Center, Face)
│ └── 2
└── 2
julia> compute!(ComputedField(tke)) So I guess we are almost there with We might update the test for computations with |
This doesn't work sadly: julia> tke = @at (Center, Center, Center) ((u - U)^2 + (v - V)^2 + w^2) / 2
BinaryOperation at (Center, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│ └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree:
/ at (Center, Center, Center) via identity
├── + at (Center, Center, Center)
│ ├── ^ at (Center, Center, Center) via identity
│ │ ├── - at (Center, Center, Center) via identity
│ │ │ ├── Field located at (Face, Center, Center)
│ │ │ └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Face, Center, Center)
│ │ └── 2
│ ├── ^ at (Center, Center, Center) via identity
│ │ ├── - at (Center, Center, Center) via identity
│ │ │ ├── Field located at (Center, Face, Center)
│ │ │ └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Center, Face, Center)
│ │ └── 2
│ └── ^ at (Center, Center, Center) via ℑzᵃᵃᶜ
│ ├── Field located at (Center, Center, Face)
│ └── 2
└── 2
julia> compute!(ComputedField(tke))
ERROR: InvalidIRError: compiling kernel gpu__compute! There's something fishy there, worth looking into. We can change the topic of this issue to focus on EDIT: I see the issue. While we ensure that binary operations always occur at the common location for fields, we haven't ensured that operations between fields and ReducedFields occur at the location of the 3D field. We can fix this by defining a few more functions. |
Thanks for all the work! I think I'll reopen this thread then since the issue is not resolved and we may still be able to make progress. Also there's a chance that someone will see this thread and have some good ideas now that we specifically link this thread in the GPU docs.
That's awesome! I vote for another separate issue for the averaged fields in the name of organization. But I'll let you decide that one since you already have insights into how to solve it. |
@glwagner Just FYI, some of the things that did not work for you, actually worked for me. Most notably: julia> tke = @at (Center, Center, Center) ((u - U)^2 + (v - V)^2 + w^2) / 2
BinaryOperation at (Center, Center, Center)
├── grid: RegularRectilinearGrid{Float64, Periodic, Periodic, Bounded}(Nx=1, Ny=1, Nz=1)
│ └── domain: x ∈ [0.0, 1.0], y ∈ [0.0, 1.0], z ∈ [-1.0, 0.0]
└── tree:
/ at (Center, Center, Center) via identity
├── + at (Center, Center, Center)
│ ├── ^ at (Center, Center, Center) via identity
│ │ ├── - at (Center, Center, Center) via identity
│ │ │ ├── Field located at (Face, Center, Center)
│ │ │ └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Face, Center, Center)
│ │ └── 2
│ ├── ^ at (Center, Center, Center) via identity
│ │ ├── - at (Center, Center, Center) via identity
│ │ │ ├── Field located at (Center, Face, Center)
│ │ │ └── AveragedField over dims=(1, 2) located at (⋅, ⋅, Center) of Field located at (Center, Face, Center)
│ │ └── 2
│ └── ^ at (Center, Center, Center) via ℑzᵃᵃᶜ
│ ├── Field located at (Center, Center, Face)
│ └── 2
└── 2
julia> compute!(tke)
julia> So it appears to be machine-dependent at least to some extent. |
The last lingering issue mentioned in #1241 (comment) was closed by #1599 ! See |
It did, but the overall issue still remains, no? I'd vote for us to keep this issue open until we can compile at least dissipation rate calculations with |
Interesting. I will open a new issue that's specific to the current issue with a concise summary. It's hard to parse the conversation in this long issue. I guess we will also have to update the docs for this. |
Sounds good. I'll change the docs after you open the new issue. |
I have the following output writer set-up in my simulation:
This works successfully on CPUs, but running on GPUs I get a huge amount of error lines with some
Running the simulation without that output works for both GPUs and CPUs.
Am I doing something wrong here?
The text was updated successfully, but these errors were encountered: