Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Extension spec for queue index #7520

Merged
merged 3 commits into from
Dec 9, 2022

Conversation

gmlueck
Copy link
Contributor

@gmlueck gmlueck commented Nov 23, 2022

Add a proposed extension specification that allows a queue to be tied to a "queue index".

Add a proposed extension specification that allows a queue to be tied
to a "queue index".
```
namespace sycl::ext::intel::property::queue {

class compute_index {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Level Zero and OpenCL drivers have 2 levels of exposing those indexes.
The first level is command queue groups, there can be multiple of those and they can have different charecteristics, i.e. you can have Compute command queue group and Copy Command Queue group.
Then inside of those command queue groups we can have multiple indexes representing engines with those capabilieis.

Here I see that everything if flattened to one level, how would this map to what Level Zero and OpenCL drivers offers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we are not trying to "flatten" the 2 levels that the Level Zero and OpenCL drivers expose. Instead, we are assuming that a SYCL queue always corresponds to a Level Zero / OpenCL "compute command queue group". Thus, compute_index corresponds to the second level -- i.e. an index within the compute command queue group.

In the future, we may decide to add another property like copy_index, which allows the application to select a particular queue index within the Level Zero / OpenCL "copy command group". However, we have no customer request for this today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarification, what would happen if we would have multiple compute groups within single device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would depend on what the semantic difference was between the various compute groups. If this happened, my first concern would be the behavior of a SYCL queue when there is no property. Would this choose one of the various compute groups arbitrarily?

Can you give me an example of why Level Zero might expose multiple compute groups? What would be the reason for doing this vs. exposing multiple indices all within the same compute group?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason would be special command queue group for cooperative dispatch kernels.
Another example would be exposing RCS/CCS separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck RCS and CCS may have different advantages. Or going in a more generic approach: there could be different types of compute engines, in the same say we have different types of copy engines (main copy engine vs link engines). So we need to have something generic and flexible enough that could accommodate those kind of cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see @MichalMrozek concerns here. Here we are saying that there's a given number of indexes available

namespace sycl::ext::intel::info::device {

struct max_compute_queue_indices;

} /

however, not all are the same. From what L0 defines here https://spec.oneapi.io/level-zero/latest/core/PROG.html#discovery

The number and properties of command queue groups is queried by using zeDeviceGetCommandQueueGroupProperties.

The number of physical engines within a group is queried from ze_command_queue_group_properties_t.numQueues.

so there's an amount of indices per queue group, while here we are presenting an aggregated count of those indices, which might not be completely correct. if we want to make this more generic, we should present also the notion of queue groups, so there's an amount of indices per group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it, the property name compute_index corresponds to the (first) "compute queue group". If we want to expose other queue groups to SYCL in the future, we will add additional queue properties. For example, if Level Zero decides to expose an RCS compute queue group, we can add a new SYCL queue property like rcs_compute_index.

I would rather not change compute_index to take both a group number and a queue index. If we did that, we'd need to add documentation about what the group number means because it seems the different queue groups all have different semantics. Since they have different semantics, I'd rather tie each to its own property, so that it's easy to document the semantics of each one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @gmlueck .

considering then that we are open to future additions or changes, then I have no immediate concerns.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same from me +1

@gmlueck
Copy link
Contributor Author

gmlueck commented Dec 1, 2022

I would like to get approval on this since the implementation is nearly finished (#7599).

@MichalMrozek: Did I address your concern above?
@intel/dpcpp-specification-reviewers: Can someone please review?

@bader bader requested a review from MichalMrozek December 6, 2022 03:04
@jbrodman jbrodman requested review from jbrodman and removed request for MichalMrozek December 8, 2022 22:28
@steffenlarsen steffenlarsen merged commit f5fb759 into intel:sycl Dec 9, 2022
steffenlarsen pushed a commit that referenced this pull request Dec 13, 2022
)

The feature needs to pass extra data to `piQueueCreate` which is
impossible with the current interface. As such, and because of the
current ABI freeze, a new `piQueueCreateEx` interface has been added
accepting `pi_queue_properties *Properties` (similarly to other
interfaces allowing optional/additional data) with the plan to retire
the old one at the next ABI break window.

Extension spec: #7520
@gmlueck gmlueck deleted the gmlueck/queue-index branch December 19, 2022 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants