-
Notifications
You must be signed in to change notification settings - Fork 101
/
GL_NV_shader_subgroup_partitioned.txt
executable file
·467 lines (326 loc) · 21.5 KB
/
GL_NV_shader_subgroup_partitioned.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
Name
NV_shader_subgroup_partitioned
Name Strings
GL_NV_shader_subgroup_partitioned
Contact
Jeff Bolz (jbolz 'at' nvidia.com), NVIDIA
Contributors
Notice
Status
Complete.
Version
Last Modified Date: 16-Mar-2018
Revision: 1
Number
TBD.
Dependencies
This extension can be applied to OpenGL GLSL versions 1.40
(#version 140) and higher.
This extension can be applied to OpenGL ES ESSL versions 3.10
(#version 310) and higher.
This extension is written against revision 6 of the OpenGL Shading Language
version 4.50, dated April 14, 2016.
This extension interacts with revision 36 of the GL_KHR_vulkan_glsl
extension, dated February 13, 2017.
This extension requires GL_KHR_shader_subgroup_basic, and is written assuming
the GL_KHR_shader_subgroup.txt extension is incorporated.
Overview
This extension adds a builtin function that "partitions" a subgroup into
sets of invocations that have the same value of a variable, returning a
ballot value indicating which invocations are in the same subset of the
partition. It also adds a set of subgroup builtin functions that accept a
ballot value and compute the scan/reduce operation across the values in
the same subset of the partition, with each subset of the partition
computing an independent result. This can be thought of as a
generalization of clustering where rather than fixed-size/offset clusters
the clustering can be arbitrary.
Mapping to SPIR-V
-----------------
For informational purposes (non-specification), the following is an
expected way for an implementation to map GLSL constructs to SPIR-V
constructs.
All subgroupPartitioned<op>NV, subgroupPartitionedInclusive<op>NV
and subgroupPartitionedExclusive<op>NV functions map to
OpGroupNonUniform<op>:
(none)/*Reduce*/ -> GroupOperationPartitionedReduceNV
Inclusive -> GroupOperationPartitionedInclusiveScanNV
Exclusive -> GroupOperationPartitionedExclusiveScanNV
When using GroupOperationPartitioned*, the <ballot> parameter is the last
operand to OpGroupNonUniform<op>, similar to ClusterSize.
subgroupPartitionNV -> OpGroupNonUniformPartitionNV
Modifications to the OpenGL Shading Language Specification, Version 4.50
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_NV_shader_subgroup_partitioned : <behavior>
where <behavior> is as specified in section 3.3. If
GL_NV_shader_subgroup_partitioned extension is enabled, the
GL_KHR_shader_subgroup_basic extension is also implicitly enabled.
A new preprocessor #define is added:
#define GL_NV_shader_subgroup_partitioned 1
Additions to Chapter 3 of the OpenGL Shading Language Specification
(Basics)
Modify Section 3.8, Definitions
(Add to the end of the Subgroup section)
There are three classes of subgroup built-in functions that take a
<ballot> parameter: subgroupPartitionedInclusive<op>NV(),
subgroupPartitionedExclusive<op>NV(), and subgroupPartitioned<op>NV(),
where <op> is one of: Add, Mul, Min, Max, And, Or, Xor
The <ballot> parameter to these functions must form a valid partition
of the active invocations in the subgroup. The values of <ballot> are
a valid partition if:
* for each active invocation <i>, the bit corresponding to <i> is
set in <i>'s value of <ballot>, and
* for any two active invocations <i> and <j>, if the bit
corresponding to invocation <j> is set in invocation <i>'s value
of <ballot>, then invocation <j>'s value of <ballot> must equal
invocation <i>'s value of <ballot>, and
* bits not corresponding to any invocation in the subgroup are
ignored.
If two active invocations <i> and <j> have the same value of <ballot>,
they are said to be "in the same subset of the partition".
subgroupPartitionedInclusive<op>NV(),
subgroupPartitionedExclusive<op>NV(), and
subgroupPartitioned<op>NV() perform an inclusive scan, exclusive scan,
and reduction, respectively, across the values for invocations in the
same subset of the partition, and that result value is returned for
all invocations in that subset of the partition. The scan/reduce is
computed for each subset of the partition. As with the
subgroupInclusive<op>() and subgroupExclusive<op>() functions, the
scans treat the invocations as ordered according to their values of
<gl_SubgroupInvocationID>.
For example, assume we have a shader such that gl_SubgroupSize is 8,
and uses the following GLSL:
float value = ...; // unique for each subgroup invocation
uvec4 ballot;
if (gl_SubgroupInvocationID & 1) == 0) {
ballot = uvec4(0x55,0,0,0); // even invocations
} else {
ballot = uvec4(0xAA,0,0,0); // odd invocations
}
float result = subgroupPartitionedAddNV(value, ballot);
where the ballot partitions invocations according to even/odd values
of <gl_SubgroupInvocationID>, and each of our 8 invocations is active
within the subgroup.
For each subgroup invocation in the set
[x(0), x(1), x(2), x(3), x(4), x(5), x(6), x(7)], the float <value> is
[42.0, 13.0, -56.0, 0.0, 128.0, -1.0, 7.0, 3.5]. The
subgroupPartitionedAddNV() operation will produce the float <result>
[121.0, 15.5, 121.0, 15.5, 121.0, 15.5, 121.0, 15.5].
If the <ballot> parameter to any partitioned subgroup operation is
not a valid partition, then the result is undefined.
Additions to Chapter 7 of the OpenGL Shading Language Specification
(Built-in Variables)
Additions to Chapter 8 of the OpenGL Shading Language Specification
(Built-in Functions)
Add to Section 8.18, Shader Invocation Group Functions
Syntax:
genType subgroupPartitionedAddNV(genType value, uvec4 ballot);
genIType subgroupPartitionedAddNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedAddNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedAddNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedAddNV() returns the summation of all active invocation
provided <value>s in the invocation's subset of the partition. The method that is used to perform the operation on
each active invocation's <value> is implementation defined.
Syntax:
genType subgroupPartitionedMulNV(genType value, uvec4 ballot);
genIType subgroupPartitionedMulNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedMulNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedMulNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedMulNV() returns the multiplication of all active
invocation-provided <value>s in the invocation's subset of the partition. The method that is used to perform the
operation on each active invocation's <value> is implementation defined.
Syntax:
genType subgroupPartitionedMinNV(genType value, uvec4 ballot);
genIType subgroupPartitionedMinNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedMinNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedMinNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedMinNV() returns the minimum <value> of all active
invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genType subgroupPartitionedMaxNV(genType value, uvec4 ballot);
genIType subgroupPartitionedMaxNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedMaxNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedMaxNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedMaxNV() returns the maximum <value> of all active
invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genIType subgroupPartitionedAndNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedAndNV(genUType value, uvec4 ballot);
genBType subgroupPartitionedAndNV(genBType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
For genIType and genUType, the function subgroupPartitionedAndNV() returns the bitwise
AND of all active invocation provided <value>s in the invocation's subset of the partition. For genBType, the function
subgroupPartitionedAndNV() returns the logical AND of all active invocation provided
<value>s in the invocation's subset of the partition.
Syntax:
genIType subgroupPartitionedOrNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedOrNV(genUType value, uvec4 ballot);
genBType subgroupPartitionedOrNV(genBType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
For genIType and genUType, the function subgroupPartitionedOrNV() returns the bitwise
OR of all active invocation provided <value>s in the invocation's subset of the partition. For genBType, the function
subgroupPartitionedOrNV() returns the logical inclusive OR of all active invocation
provided <value>s in the invocation's subset of the partition.
Syntax:
genIType subgroupPartitionedXorNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedXorNV(genUType value, uvec4 ballot);
genBType subgroupPartitionedXorNV(genBType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
For genIType and genUType, the function subgroupPartitionedXorNV() returns the bitwise
XOR of all active invocation provided <value>s in the invocation's subset of the partition. For genBType, the function
subgroupPartitionedXorNV() returns the logical exclusive OR of all active invocation
provided <value>s in the invocation's subset of the partition.
Syntax:
genType subgroupPartitionedInclusiveAddNV(genType value, uvec4 ballot);
genIType subgroupPartitionedInclusiveAddNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedInclusiveAddNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedInclusiveAddNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedInclusiveAddNV() returns an inclusive scan operation
that is the summation of all active invocation-provided <value>s in the invocation's subset of the partition. The
method used to perform the operation on each active invocation's <value>
is implementation defined.
Syntax:
genType subgroupPartitionedInclusiveMulNV(genType value, uvec4 ballot);
genIType subgroupPartitionedInclusiveMulNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedInclusiveMulNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedInclusiveMulNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedInclusiveMulNV() returns an inclusive scan operation
that is the multiplication of all active invocation-provided <value>s in the invocation's subset of the partition.
The method used to perform the operation on each active invocation's <value>
is implementation defined.
Syntax:
genType subgroupPartitionedInclusiveMinNV(genType value, uvec4 ballot);
genIType subgroupPartitionedInclusiveMinNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedInclusiveMinNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedInclusiveMinNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedInclusiveMinNV() returns an inclusive scan operation
that is the minimum <value> of all active invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genType subgroupPartitionedInclusiveMaxNV(genType value, uvec4 ballot);
genIType subgroupPartitionedInclusiveMaxNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedInclusiveMaxNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedInclusiveMaxNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedInclusiveMaxNV() returns an inclusive scan operation
that is the maximum <value> of all active invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genIType subgroupPartitionedInclusiveAndNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedInclusiveAndNV(genUType value, uvec4 ballot);
genBType subgroupPartitionedInclusiveAndNV(genBType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
For genIType and genUType, the function subgroupPartitionedInclusiveAndNV() returns an
inclusive scan operation that is the bitwise AND of all active
invocation-provided <value>s in the invocation's subset of the partition. For genBType, the function
subgroupPartitionedInclusiveAndNV() returns an inclusive scan operation that is the
logical AND of all active invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genIType subgroupPartitionedInclusiveOrNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedInclusiveOrNV(genUType value, uvec4 ballot);
genBType subgroupPartitionedInclusiveOrNV(genBType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
For genIType and genUType, the function subgroupPartitionedInclusiveOrNV() returns an
inclusive scan operation that is the bitwise OR of all active
invocation-provided <value>s in the invocation's subset of the partition. For genBType, the function
subgroupPartitionedInclusiveOrNV() returns an inclusive scan operation that is the
logical inclusive OR of all active invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genIType subgroupPartitionedInclusiveXorNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedInclusiveXorNV(genUType value, uvec4 ballot);
genBType subgroupPartitionedInclusiveXorNV(genBType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
For genIType and genUType, the function subgroupPartitionedInclusiveXorNV() returns an
inclusive scan operation that is the bitwise XOR of all active
invocation-provided <value>s in the invocation's subset of the partition. For genBType, the function
subgroupPartitionedInclusiveXorNV() returns an inclusive scan operation that is the
logical exclusive OR of all active invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genType subgroupPartitionedExclusiveAddNV(genType value, uvec4 ballot);
genIType subgroupPartitionedExclusiveAddNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedExclusiveAddNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedExclusiveAddNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedExclusiveAddNV() returns an exclusive scan operation
that is the summation of all active invocation-provided <value>s in the invocation's subset of the partition.
The method used to perform the operation on each active invocation's <value>
is implementation defined.
Syntax:
genType subgroupPartitionedExclusiveMulNV(genType value, uvec4 ballot);
genIType subgroupPartitionedExclusiveMulNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedExclusiveMulNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedExclusiveMulNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedExclusiveMulNV() returns an exclusive scan operation
that is the multiplication of all active invocation-provided <value>s in the invocation's subset of the partition.
The method used to perform the operation on each active invocation's <value>
is implementation defined.
Syntax:
genType subgroupPartitionedExclusiveMinNV(genType value, uvec4 ballot);
genIType subgroupPartitionedExclusiveMinNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedExclusiveMinNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedExclusiveMinNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedExclusiveMinNV() returns an exclusive scan operation
that is the minimum <value> of all active invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genType subgroupPartitionedExclusiveMaxNV(genType value, uvec4 ballot);
genIType subgroupPartitionedExclusiveMaxNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedExclusiveMaxNV(genUType value, uvec4 ballot);
genDType subgroupPartitionedExclusiveMaxNV(genDType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionedExclusiveMaxNV() returns an exclusive scan operation
that is the maximum <value> of all active invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genIType subgroupPartitionedExclusiveAndNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedExclusiveAndNV(genUType value, uvec4 ballot);
genBType subgroupPartitionedExclusiveAndNV(genBType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
For genIType and genUType, the function subgroupPartitionedExclusiveAndNV() returns an
exclusive scan operation that is the bitwise AND of all active
invocation-provided <value>s in the invocation's subset of the partition. For genBType, the function
subgroupPartitionedExclusiveAndNV() returns an exclusive scan operation that is the
logical AND of all active invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genIType subgroupPartitionedExclusiveOrNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedExclusiveOrNV(genUType value, uvec4 ballot);
genBType subgroupPartitionedExclusiveOrNV(genBType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
For genIType and genUType, the function subgroupPartitionedExclusiveOrNV() returns an
exclusive scan operation that is the bitwise OR of all active
invocation-provided <value>s in the invocation's subset of the partition. For genBType, the function
subgroupPartitionedExclusiveOrNV() returns an exclusive scan operation that is the
logical inclusive OR of all active invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
genIType subgroupPartitionedExclusiveXorNV(genIType value, uvec4 ballot);
genUType subgroupPartitionedExclusiveXorNV(genUType value, uvec4 ballot);
genBType subgroupPartitionedExclusiveXorNV(genBType value, uvec4 ballot);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
For genIType and genUType, the function subgroupPartitionedExclusiveXorNV() returns an
exclusive scan operation that is the bitwise XOR of all active
invocation-provided <value>s in the invocation's subset of the partition. For genBType, the function
subgroupPartitionedExclusiveXorNV() returns an exclusive scan operation that is the
logical exclusive OR of all active invocation-provided <value>s in the invocation's subset of the partition.
Syntax:
uvec4 subgroupPartitionNV(genType value);
uvec4 subgroupPartitionNV(genIType value);
uvec4 subgroupPartitionNV(genUType value);
uvec4 subgroupPartitionNV(genBType value);
uvec4 subgroupPartitionNV(genDType value);
Only usable if the extension GL_NV_shader_subgroup_partitioned is enabled.
The function subgroupPartitionNV() returns a ballot that is a valid
partition of the active invocations such that all invocations in each
subset of the partition have the same value of <value>. For any two
invocations in different subsets of the partition, either their values of
<value> must not be equal or one must be a floating point NaN.
Issues
None.
Revision History
Rev. Date Author Changes
---- ----------- -------- -------------------------------------------
1 26-Dec-2017 jbolz Initial revision.