-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add asynchronous concurrent execution #3687
base: docs/develop
Are you sure you want to change the base?
Conversation
1484d67
to
f81588d
Compare
fd5af51
to
6a139c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left comments. Looks good overall.
or from the GPU concurrently with kernel execution. Applications can query this | ||
capability by checking the ``asyncEngineCount`` device property. Devices with | ||
an ``asyncEngineCount`` greater than zero support concurrent data transfers. | ||
Additionally, if host memory is involved in the copy, it should be page-locked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reference we can provide such as Memory Management or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find. The only thing that I found is:
https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/structhip_device_prop__t.html#a24ee882099e1b02e95f6c245ff1ea0cf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my comment was related to the "if host memory is involved in the copy, it should be page-locked to ensure optimal performance" text. This seems like it could use a reference to some additional details, or maybe just indicate how page-locked ensures optimal performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misunderstood you.
Yes, you are right, I tried to extend the content and I added a link to host memory.
|
||
It is also possible to perform intra-device copies simultaneously with kernel | ||
execution on devices that support the ``concurrentKernels`` device property | ||
and/or with copies to or from the device (for devices that support the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and/or with copies to or from the device (for devices that support the | |
and/or with copies to or from the device (for devices that support the |
Are copies to or from the device intra-device copies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, intra-device copying is data transfer within the same device.
Thank you for drawing my attention to it. The text was misleading, so I rephrased it.
495e166
to
9835194
Compare
6b15a56
to
ed7e05f
Compare
No description provided.