Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spirv] Inout semantics in presence of early thread termination do no match DXIL codegen #5158

Closed
alelenv opened this issue Apr 19, 2023 · 12 comments
Assignees
Labels
bug Bug, regression, crash spirv Work related to SPIR-V

Comments

@alelenv
Copy link
Contributor

alelenv commented Apr 19, 2023

For the following any-hit shader

struct Payload 
{
	float4 data;
};
struct HitAttr 
{
    float3 bary;
};
void userFunc(inout Payload localPayload) {
     localPayload.data.x += 1.0f;
     AcceptHitAndEndSearch();
}

[shader("anyhit")] void SphereAnyHit(inout Payload p, in HitAttr h) {
     userFunc(p);
}

DXC generates the following DXIL

; Function Attrs: noreturn nounwind`
define void @"\01?SphereAnyHit@@YAXUPayload@@UHitAttr@@@Z"(%struct.Payload* noalias nocapture %p, %struct.HitAttr* nocapture readnone %h) #0 {
  %1 = getelementptr inbounds %struct.Payload, %struct.Payload* %p, i32 0, i32 0
  %2 = load <4 x float>, <4 x float>* %1, align 4, !alias.scope !19
  %3 = extractelement <4 x float> %2, i32 0
  %4 = fadd fast float %3, 1.000000e+00
  %5 = getelementptr inbounds %struct.Payload, %struct.Payload* %p, i32 0, i32 0, i32 0
  store float %4, float* %5, align 4, !alias.scope !19
  call void @dx.op.acceptHitAndEndSearch(i32 156)  ; AcceptHitAndEndSearch()
  unreachable
}
`

Here the payload passed in to 'userFunc' as an 'inout' argument is passed in by reference and hence mutated
Note HLSL spec states
https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-function-parameters
That values are copied in and copied out. (I guess pass by reference is a valid implementation of the semantics)

The code generated for SPIRV is

               OpCapability RayTracingKHR
               OpExtension "SPV_KHR_ray_tracing"
               OpMemoryModel Logical GLSL450
               OpEntryPoint AnyHitNV %SphereAnyHit "SphereAnyHit"
               OpSource HLSL 660
               OpName %SphereAnyHit "SphereAnyHit"
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
%SphereAnyHit = OpFunction %void None %3
          %4 = OpLabel
               OpTerminateRayKHR
               OpFunctionEnd

Note modification of payload is completely lost.
This happens because for SPIRV codegen, 'inout' parameter semantics are implemented as pass by value-result, i.e they are copied in and then copied out (This can be seen by generating code at O0)

However since 'AcceptHitAndEndSearch' is a thread terminating instruction. We never perform the copy-out
This is technically a legal implementation of 'inout'

Probably couple of questions arise out of this

  1. HLSL folks, I guess this was specifically done for payloads, otherwise you can never write to a payload in any non-entry function
    Are there any other scenarios where DXC does something special?

  2. DXC folks, if we were to make this work, we have to implement some form of unwind/cleanup for this special case?
    (Ugly way is to track using a boolean and unwind all the way to main and then write out to payload before exiting?)

@s-perron s-perron added bug Bug, regression, crash spirv Work related to SPIR-V labels May 4, 2023
@s-perron
Copy link
Collaborator

s-perron commented May 4, 2023

I did a test that uses discard, and it seems like DXIL does the copy-in/copy-out as well.

The pixel shader I tried was

struct Input
{
    float4 position : SV_POSITION;
    float4 color : COLOR;
};

void foo(inout float4 p) {
  p = 1.0;
  discard;
}

float4 main(Input input) : SV_Target
{
    foo(input.color);
    return input.color;
}

The DXIL is

define void @main() {
  call void @dx.op.discard(i32 82, i1 true)  ; Discard(condition)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 0, float 1.000000e+00)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 1, float 1.000000e+00)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 2, float 1.000000e+00)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 3, float 1.000000e+00)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  ret void
}

Note that the stores are after the discard. I'll have to find out what is happening for the ray-tracing example.

@s-perron
Copy link
Collaborator

s-perron commented May 4, 2023

I looked at it a bit more. It seems like the DXIL path "optimizes" the case when an inout parameter is passed as a function to another inout parameter. The idea is that the first inout parameter is suppose to imply that they already have a "local" copy of the variable, so they don't need another one. However, that is not true for the anyhit shader parameters. We might need some input from @llvm-beanz to know what the correct behaviour it. At this point I think the SPIR-V is correct, even if it is annoying.

This is snippet where the DXIL path does not do the copy-in and copy-out:

struct Input
{
    float4 position : SV_POSITION;
    float4 color : COLOR;
};

void foo(inout float4 p) {
  p = 1.0;
}

void bar(inout float4 p) {
  // No copy-in here
  foo(p);
  // No copy-out here
  discard;
  p = 2.0;
}

float4 main(Input input) : SV_Target
{
    // There is a copy-in here
    bar(input.color);
    // There is a copy-out here
    return input.color;
}

@llvm-beanz
Copy link
Collaborator

Thank you @s-perron for pointing this issue out to me. I'm working on a big chunk of changes in this area that impact both DXIL and SPIR-V, and I am trying to get the DXIL and SPIR-V code generation to match.

SPIR-V doesn't seem to correctly implement copy-in/copy-out argument passing at all. I'm also actually not sure DXIL's current behavior is correct in the presence of thread termination, so I'll need to think that through.

This branch has my current work in progress changes. There are still some remaining bugs on the DXIL side relating to matrix orientation annotations, but otherwise the DXIL implementation is functional. I just started the SPIR-V implementation, and it only works for extremely trivial cases.

In general the goal of those changes is to try and more accurately capture the parameter passing semantics in the AST for out, inout and array parameters.

To give a concrete illustration on one of the extremely simple deviations between DXIL and SPIR-V take this simple example:

RWBuffer<int> Buf;

void fn(inout int X, inout int Y) {
  Y = 2;
  X = 1;
}

[numthreads(1,1,1)]
void main(int GI : SV_GroupIndex) {
  int Val = 3;
  fn(Val, Val);
  Buf[GI] = Val;
}

Compiler output:

If you look at the examples, DXIL stores 2, and SPIR-V stores 1. This is because SPIR-V more or less passes Val in by reference to fn so the writes inside the function write back to Val's storage and the order in the function is preserved.

In DXIL (and in theory by HLSL's definition), function parameters are copy-in/copy-out so they are always guaranteed to reference unique memory inside fn. On returning the parameters are copied back left-to-right to the argument variables, so Y's value at the end of the function is written to Val second, regardless of the order in which Y is set inside fn.

@s-perron
Copy link
Collaborator

s-perron commented May 4, 2023

This is because SPIR-V more or less passes Val in by reference to fn so the writes inside the function write back to Val's storage and the order in the function is preserved.

Interesting. I looked at the SPIR-V code generated by DXC pre-optimization. It looks like the SPIR-V code is doing a copy-in and copy-out, but it reuses the same "local" copy for both parameters because they are the same input value.

I won't do anything, and I'll wait for your AST changes to be ready for a more detailed review.

@alelenv
Copy link
Contributor Author

alelenv commented May 4, 2023

FYI, for reference this was cut down test from a customer reported issue
https://forums.developer.nvidia.com/t/anyhit-payload-lost-when-calling-accepthitandendsearch-or-ignorehit/242880

If I follow HLSLs current language semantics for 'inout' being copy-in and copy-out, it makes it impossible to write to payload in user function in presence of thread terminating instructions.
Would be nice to have some HLSL language for this special behavior

@llvm-beanz
Copy link
Collaborator

We are working on writing a formal specification for HLSL. The current behavior right now in DXC is that we try to eliminate copies where we can, so it is actually more like the value of an out or inout parameter is undefined if the function doesn't return.

That may actually be what the language in the spec will need to say.

@s-perron
Copy link
Collaborator

s-perron commented May 4, 2023

Would be nice to have some HLSL language for this special behavior

There is a proposal to add reference parameter to HLSL . That will be what you want.

https://github.com/microsoft/hlsl-specs/blob/main/proposals/0006-reference-types.md

@s-perron
Copy link
Collaborator

I will close this bug. SPIR-V is not breaking the spec. It is the intended behaviour. As Chris mentioned, the spec will probably state the behaviour is undefined. You will need to wait for the references proposal to be implemented or find some other workaround.

@s-perron s-perron self-assigned this Sep 29, 2023
@natevm
Copy link
Contributor

natevm commented Oct 16, 2023

@s-perron Because proposals take quite a while to get through (I'd imagine especially for something as broad as adding references to HLSL), would it be possible to issue at least a compiler warning that tells the developer that the code they've written will result in undefined behavior? It's an incredibly difficult issue to track down otherwise, and vendors seem to be taking the blame for what's really a DXC issue.

For what it's worth, I myself and others are running into this problem: https://forums.developer.nvidia.com/t/anyhit-payload-lost-when-calling-accepthitandendsearch-or-ignorehit/242880/9

@natevm
Copy link
Contributor

natevm commented Oct 16, 2023

I mentioned on the other thread, but I have a kludgy workaround for the issue.

The idea is to relax the behavior of "AcceptHitAndEndSearch" to not terminate the thread early. I create two static booleans above my entrypoint:

static bool _ignoreHit = false;
static bool _acceptHitAndEndSearch = false;
namespace gprt {
// See DXC issue #5158
  void
  ignoreHit() {
    _ignoreHit = true;
  }

  void
  acceptHitAndEndSearch() {
    _acceptHitAndEndSearch = true;
  }
};   // namespace gprt

Then, I add the following code to the end of my anyhit entrypoint:

if (_ignoreHit)
    IgnoreHit();
if (_acceptHitAndEndSearch)
    AcceptHitAndEndSearch();

This allows for payload values to be written to before the thread is terminated.

Is it even correct behavior to terminate threads early when these intrinsics are called? Doesn't that violate the RT spec? I believe that NVIDIA's OptiX platform continues execution to the end of the function body.

@s-perron
Copy link
Collaborator

@s-perron Because proposals take quite a while to get through (I'd imagine especially for something as broad as adding references to HLSL), would it be possible to issue at least a compiler warning that tells the developer that the code they've written will result in undefined behavior? It's an incredibly difficult issue to track down otherwise, and vendors seem to be taking the blame for what's really a DXC issue.

For what it's worth, I myself and others are running into this problem: https://forums.developer.nvidia.com/t/anyhit-payload-lost-when-calling-accepthitandendsearch-or-ignorehit/242880/9

I'll see what we can do. Once the code has been translated to spir-v, we can no longer identify the potential problem. I'm not very familiar with the clang part of dxc to know what we can do in that part.

@natevm
Copy link
Contributor

natevm commented Oct 17, 2023

I suppose my bigger question is why these intrinsics terminate the thread in HLSL in the first place. They modify the traversal state to stop traversing, but I don’t see why the thread couldn’t continue onto the end of the end of the kernel.

I’m not even sure if it’s part of the DXR/RTX specification to terminate the threads in this way. Perhaps this is just a misunderstanding of the spec during implementation?

Just removing that requirement that threads terminate on these calls would seemingly fix the issue, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug, regression, crash spirv Work related to SPIR-V
Projects
None yet
Development

No branches or pull requests

5 participants