-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Extension spec for queue index #7520
Conversation
Add a proposed extension specification that allows a queue to be tied to a "queue index".
``` | ||
namespace sycl::ext::intel::property::queue { | ||
|
||
class compute_index { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Level Zero and OpenCL drivers have 2 levels of exposing those indexes.
The first level is command queue groups, there can be multiple of those and they can have different charecteristics, i.e. you can have Compute command queue group and Copy Command Queue group.
Then inside of those command queue groups we can have multiple indexes representing engines with those capabilieis.
Here I see that everything if flattened to one level, how would this map to what Level Zero and OpenCL drivers offers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we are not trying to "flatten" the 2 levels that the Level Zero and OpenCL drivers expose. Instead, we are assuming that a SYCL queue
always corresponds to a Level Zero / OpenCL "compute command queue group". Thus, compute_index
corresponds to the second level -- i.e. an index within the compute command queue group.
In the future, we may decide to add another property like copy_index
, which allows the application to select a particular queue index within the Level Zero / OpenCL "copy command group". However, we have no customer request for this today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarification, what would happen if we would have multiple compute groups within single device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would depend on what the semantic difference was between the various compute groups. If this happened, my first concern would be the behavior of a SYCL queue
when there is no property. Would this choose one of the various compute groups arbitrarily?
Can you give me an example of why Level Zero might expose multiple compute groups? What would be the reason for doing this vs. exposing multiple indices all within the same compute group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason would be special command queue group for cooperative dispatch kernels.
Another example would be exposing RCS/CCS separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmlueck RCS and CCS may have different advantages. Or going in a more generic approach: there could be different types of compute engines, in the same say we have different types of copy engines (main copy engine vs link engines). So we need to have something generic and flexible enough that could accommodate those kind of cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see @MichalMrozek concerns here. Here we are saying that there's a given number of indexes available
namespace sycl::ext::intel::info::device {
struct max_compute_queue_indices;
} /
however, not all are the same. From what L0 defines here https://spec.oneapi.io/level-zero/latest/core/PROG.html#discovery
The number and properties of command queue groups is queried by using zeDeviceGetCommandQueueGroupProperties.
The number of physical engines within a group is queried from ze_command_queue_group_properties_t.numQueues.
so there's an amount of indices per queue group, while here we are presenting an aggregated count of those indices, which might not be completely correct. if we want to make this more generic, we should present also the notion of queue groups, so there's an amount of indices per group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I see it, the property name compute_index
corresponds to the (first) "compute queue group". If we want to expose other queue groups to SYCL in the future, we will add additional queue properties. For example, if Level Zero decides to expose an RCS compute queue group, we can add a new SYCL queue property like rcs_compute_index
.
I would rather not change compute_index
to take both a group number and a queue index. If we did that, we'd need to add documentation about what the group number means because it seems the different queue groups all have different semantics. Since they have different semantics, I'd rather tie each to its own property, so that it's easy to document the semantics of each one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @gmlueck .
considering then that we are open to future additions or changes, then I have no immediate concerns.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same from me +1
I would like to get approval on this since the implementation is nearly finished (#7599). @MichalMrozek: Did I address your concern above? |
) The feature needs to pass extra data to `piQueueCreate` which is impossible with the current interface. As such, and because of the current ABI freeze, a new `piQueueCreateEx` interface has been added accepting `pi_queue_properties *Properties` (similarly to other interfaces allowing optional/additional data) with the plan to retire the old one at the next ABI break window. Extension spec: #7520
Add a proposed extension specification that allows a queue to be tied to a "queue index".