[spirv] Inout semantics in presence of early thread termination do no match DXIL codegen #5158

alelenv · 2023-04-19T03:38:18Z

For the following any-hit shader

struct Payload 
{
	float4 data;
};
struct HitAttr 
{
    float3 bary;
};
void userFunc(inout Payload localPayload) {
     localPayload.data.x += 1.0f;
     AcceptHitAndEndSearch();
}

[shader("anyhit")] void SphereAnyHit(inout Payload p, in HitAttr h) {
     userFunc(p);
}

DXC generates the following DXIL

; Function Attrs: noreturn nounwind`
define void @"\01?SphereAnyHit@@YAXUPayload@@UHitAttr@@@Z"(%struct.Payload* noalias nocapture %p, %struct.HitAttr* nocapture readnone %h) #0 {
  %1 = getelementptr inbounds %struct.Payload, %struct.Payload* %p, i32 0, i32 0
  %2 = load <4 x float>, <4 x float>* %1, align 4, !alias.scope !19
  %3 = extractelement <4 x float> %2, i32 0
  %4 = fadd fast float %3, 1.000000e+00
  %5 = getelementptr inbounds %struct.Payload, %struct.Payload* %p, i32 0, i32 0, i32 0
  store float %4, float* %5, align 4, !alias.scope !19
  call void @dx.op.acceptHitAndEndSearch(i32 156)  ; AcceptHitAndEndSearch()
  unreachable
}
`

Here the payload passed in to 'userFunc' as an 'inout' argument is passed in by reference and hence mutated
Note HLSL spec states
https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-function-parameters
That values are copied in and copied out. (I guess pass by reference is a valid implementation of the semantics)

The code generated for SPIRV is

               OpCapability RayTracingKHR
               OpExtension "SPV_KHR_ray_tracing"
               OpMemoryModel Logical GLSL450
               OpEntryPoint AnyHitNV %SphereAnyHit "SphereAnyHit"
               OpSource HLSL 660
               OpName %SphereAnyHit "SphereAnyHit"
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
%SphereAnyHit = OpFunction %void None %3
          %4 = OpLabel
               OpTerminateRayKHR
               OpFunctionEnd

Note modification of payload is completely lost.
This happens because for SPIRV codegen, 'inout' parameter semantics are implemented as pass by value-result, i.e they are copied in and then copied out (This can be seen by generating code at O0)

However since 'AcceptHitAndEndSearch' is a thread terminating instruction. We never perform the copy-out
This is technically a legal implementation of 'inout'

Probably couple of questions arise out of this

HLSL folks, I guess this was specifically done for payloads, otherwise you can never write to a payload in any non-entry function
Are there any other scenarios where DXC does something special?
DXC folks, if we were to make this work, we have to implement some form of unwind/cleanup for this special case?
(Ugly way is to track using a boolean and unwind all the way to main and then write out to payload before exiting?)

The text was updated successfully, but these errors were encountered:

s-perron · 2023-05-04T13:47:12Z

I did a test that uses discard, and it seems like DXIL does the copy-in/copy-out as well.

The pixel shader I tried was

struct Input
{
    float4 position : SV_POSITION;
    float4 color : COLOR;
};

void foo(inout float4 p) {
  p = 1.0;
  discard;
}

float4 main(Input input) : SV_Target
{
    foo(input.color);
    return input.color;
}

The DXIL is

define void @main() {
  call void @dx.op.discard(i32 82, i1 true)  ; Discard(condition)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 0, float 1.000000e+00)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 1, float 1.000000e+00)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 2, float 1.000000e+00)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  call void @dx.op.storeOutput.f32(i32 5, i32 0, i32 0, i8 3, float 1.000000e+00)  ; StoreOutput(outputSigId,rowIndex,colIndex,value)
  ret void
}

Note that the stores are after the discard. I'll have to find out what is happening for the ray-tracing example.

s-perron · 2023-05-04T14:59:23Z

I looked at it a bit more. It seems like the DXIL path "optimizes" the case when an inout parameter is passed as a function to another inout parameter. The idea is that the first inout parameter is suppose to imply that they already have a "local" copy of the variable, so they don't need another one. However, that is not true for the anyhit shader parameters. We might need some input from @llvm-beanz to know what the correct behaviour it. At this point I think the SPIR-V is correct, even if it is annoying.

This is snippet where the DXIL path does not do the copy-in and copy-out:

struct Input
{
    float4 position : SV_POSITION;
    float4 color : COLOR;
};

void foo(inout float4 p) {
  p = 1.0;
}

void bar(inout float4 p) {
  // No copy-in here
  foo(p);
  // No copy-out here
  discard;
  p = 2.0;
}

float4 main(Input input) : SV_Target
{
    // There is a copy-in here
    bar(input.color);
    // There is a copy-out here
    return input.color;
}

llvm-beanz · 2023-05-04T15:46:45Z

Thank you @s-perron for pointing this issue out to me. I'm working on a big chunk of changes in this area that impact both DXIL and SPIR-V, and I am trying to get the DXIL and SPIR-V code generation to match.

SPIR-V doesn't seem to correctly implement copy-in/copy-out argument passing at all. I'm also actually not sure DXIL's current behavior is correct in the presence of thread termination, so I'll need to think that through.

This branch has my current work in progress changes. There are still some remaining bugs on the DXIL side relating to matrix orientation annotations, but otherwise the DXIL implementation is functional. I just started the SPIR-V implementation, and it only works for extremely trivial cases.

In general the goal of those changes is to try and more accurately capture the parameter passing semantics in the AST for out, inout and array parameters.

To give a concrete illustration on one of the extremely simple deviations between DXIL and SPIR-V take this simple example:

RWBuffer<int> Buf;

void fn(inout int X, inout int Y) {
  Y = 2;
  X = 1;
}

[numthreads(1,1,1)]
void main(int GI : SV_GroupIndex) {
  int Val = 3;
  fn(Val, Val);
  Buf[GI] = Val;
}

Compiler output:

DXIL
SPIR-V

If you look at the examples, DXIL stores 2, and SPIR-V stores 1. This is because SPIR-V more or less passes Val in by reference to fn so the writes inside the function write back to Val's storage and the order in the function is preserved.

In DXIL (and in theory by HLSL's definition), function parameters are copy-in/copy-out so they are always guaranteed to reference unique memory inside fn. On returning the parameters are copied back left-to-right to the argument variables, so Y's value at the end of the function is written to Val second, regardless of the order in which Y is set inside fn.

s-perron · 2023-05-04T16:19:03Z

This is because SPIR-V more or less passes Val in by reference to fn so the writes inside the function write back to Val's storage and the order in the function is preserved.

Interesting. I looked at the SPIR-V code generated by DXC pre-optimization. It looks like the SPIR-V code is doing a copy-in and copy-out, but it reuses the same "local" copy for both parameters because they are the same input value.

I won't do anything, and I'll wait for your AST changes to be ready for a more detailed review.

alelenv · 2023-05-04T19:07:19Z

FYI, for reference this was cut down test from a customer reported issue
https://forums.developer.nvidia.com/t/anyhit-payload-lost-when-calling-accepthitandendsearch-or-ignorehit/242880

If I follow HLSLs current language semantics for 'inout' being copy-in and copy-out, it makes it impossible to write to payload in user function in presence of thread terminating instructions.
Would be nice to have some HLSL language for this special behavior

llvm-beanz · 2023-05-04T19:31:20Z

We are working on writing a formal specification for HLSL. The current behavior right now in DXC is that we try to eliminate copies where we can, so it is actually more like the value of an out or inout parameter is undefined if the function doesn't return.

That may actually be what the language in the spec will need to say.

s-perron · 2023-05-04T19:34:31Z

Would be nice to have some HLSL language for this special behavior

There is a proposal to add reference parameter to HLSL . That will be what you want.

https://github.com/microsoft/hlsl-specs/blob/main/proposals/0006-reference-types.md

s-perron · 2023-09-29T17:28:52Z

I will close this bug. SPIR-V is not breaking the spec. It is the intended behaviour. As Chris mentioned, the spec will probably state the behaviour is undefined. You will need to wait for the references proposal to be implemented or find some other workaround.

natevm · 2023-10-16T16:49:15Z

@s-perron Because proposals take quite a while to get through (I'd imagine especially for something as broad as adding references to HLSL), would it be possible to issue at least a compiler warning that tells the developer that the code they've written will result in undefined behavior? It's an incredibly difficult issue to track down otherwise, and vendors seem to be taking the blame for what's really a DXC issue.

For what it's worth, I myself and others are running into this problem: https://forums.developer.nvidia.com/t/anyhit-payload-lost-when-calling-accepthitandendsearch-or-ignorehit/242880/9

natevm · 2023-10-16T16:59:38Z

I mentioned on the other thread, but I have a kludgy workaround for the issue.

The idea is to relax the behavior of "AcceptHitAndEndSearch" to not terminate the thread early. I create two static booleans above my entrypoint:

static bool _ignoreHit = false;
static bool _acceptHitAndEndSearch = false;
namespace gprt {
// See DXC issue #5158
  void
  ignoreHit() {
    _ignoreHit = true;
  }

  void
  acceptHitAndEndSearch() {
    _acceptHitAndEndSearch = true;
  }
};   // namespace gprt

Then, I add the following code to the end of my anyhit entrypoint:

if (_ignoreHit)
    IgnoreHit();
if (_acceptHitAndEndSearch)
    AcceptHitAndEndSearch();

This allows for payload values to be written to before the thread is terminated.

Is it even correct behavior to terminate threads early when these intrinsics are called? Doesn't that violate the RT spec? I believe that NVIDIA's OptiX platform continues execution to the end of the function body.

s-perron · 2023-10-17T14:20:36Z

@s-perron Because proposals take quite a while to get through (I'd imagine especially for something as broad as adding references to HLSL), would it be possible to issue at least a compiler warning that tells the developer that the code they've written will result in undefined behavior? It's an incredibly difficult issue to track down otherwise, and vendors seem to be taking the blame for what's really a DXC issue.

For what it's worth, I myself and others are running into this problem: https://forums.developer.nvidia.com/t/anyhit-payload-lost-when-calling-accepthitandendsearch-or-ignorehit/242880/9

I'll see what we can do. Once the code has been translated to spir-v, we can no longer identify the potential problem. I'm not very familiar with the clang part of dxc to know what we can do in that part.

natevm · 2023-10-17T19:36:12Z

I suppose my bigger question is why these intrinsics terminate the thread in HLSL in the first place. They modify the traversal state to stop traversing, but I don’t see why the thread couldn’t continue onto the end of the end of the kernel.

I’m not even sure if it’s part of the DXR/RTX specification to terminate the threads in this way. Perhaps this is just a misunderstanding of the spec during implementation?

Just removing that requirement that threads terminate on these calls would seemingly fix the issue, right?

s-perron added bug Bug, regression, crash spirv Work related to SPIR-V labels May 4, 2023

s-perron mentioned this issue Jun 19, 2023

WIP: Workaround AccessChain function arguments for DebugDeclare KhronosGroup/SPIRV-Tools#5275

Closed

sudonatalie assigned sudonatalie and unassigned sudonatalie Aug 10, 2023

s-perron closed this as completed Sep 29, 2023

s-perron self-assigned this Sep 29, 2023

BattleAxeVR mentioned this issue Nov 18, 2023

HLSL support in Vulkan Video samples nvpro-samples/vk_video_samples#60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spirv] Inout semantics in presence of early thread termination do no match DXIL codegen #5158

[spirv] Inout semantics in presence of early thread termination do no match DXIL codegen #5158

alelenv commented Apr 19, 2023

s-perron commented May 4, 2023 •

edited

Loading

s-perron commented May 4, 2023

llvm-beanz commented May 4, 2023

s-perron commented May 4, 2023

alelenv commented May 4, 2023 •

edited

Loading

llvm-beanz commented May 4, 2023

s-perron commented May 4, 2023

s-perron commented Sep 29, 2023

natevm commented Oct 16, 2023 •

edited

Loading

natevm commented Oct 16, 2023 •

edited

Loading

s-perron commented Oct 17, 2023

natevm commented Oct 17, 2023 •

edited

Loading

[spirv] Inout semantics in presence of early thread termination do no match DXIL codegen #5158

[spirv] Inout semantics in presence of early thread termination do no match DXIL codegen #5158

Comments

alelenv commented Apr 19, 2023

s-perron commented May 4, 2023 • edited Loading

s-perron commented May 4, 2023

llvm-beanz commented May 4, 2023

s-perron commented May 4, 2023

alelenv commented May 4, 2023 • edited Loading

llvm-beanz commented May 4, 2023

s-perron commented May 4, 2023

s-perron commented Sep 29, 2023

natevm commented Oct 16, 2023 • edited Loading

natevm commented Oct 16, 2023 • edited Loading

s-perron commented Oct 17, 2023

natevm commented Oct 17, 2023 • edited Loading

s-perron commented May 4, 2023 •

edited

Loading

alelenv commented May 4, 2023 •

edited

Loading

natevm commented Oct 16, 2023 •

edited

Loading

natevm commented Oct 16, 2023 •

edited

Loading

natevm commented Oct 17, 2023 •

edited

Loading