Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[src] use CL_PROFILING_COMMAND_END as latency time #67

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

alohali
Copy link

@alohali alohali commented Apr 16, 2020

CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED is real kernel latency

@alohali
Copy link
Author

alohali commented Apr 16, 2020

Is it more accurate to test kernel latency with CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED and run a extreme small kernel?
see >20us difference on several ARM MALI GPU device.

@krrishnarraj
Copy link
Owner

Thanks.
I agree with the small kernel part.
I am seeing more latency for cpu platforms like pocl. How can 'CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED' give better accuracy wrt CL_PROFILING_COMMAND_START?

@alohali
Copy link
Author

alohali commented Apr 23, 2020

Thanks.
I agree with the small kernel part.
I am seeing more latency for cpu platforms like pocl. How can 'CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED' give better accuracy wrt CL_PROFILING_COMMAND_START?

Because kernel launch latency contains pre-launch, post-launch latency and other execution latency. CL_PROFILING_COMMAND_START - CL_PROFILING_COMMAND_QUEUED only calculates pre launch parts but not post launch parts. CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED includes both pre and post. The real kernel execution time is almost zero.

@nchristensen
Copy link
Contributor

nchristensen commented Oct 7, 2022

From https://stackoverflow.com/questions/39924433/opencl-events-ambiguity it seems to me that CL_PROFILING_COMMAND_SUBMIT - CL_PROFILING_COMMAND_START is the pre-execution latency. CL_PROFILING_COMMAND_COMPLETE was added in OpenCL 2.0. I'm guessing CL_PROFILING_COMMAND_COMPLETE - CL_PROFILING_COMMAND_END is the post-execution latency.

There may also a lower bound on CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_START which might be another form of latency.

So CL_PROFILING_COMMAND_COMPLETE - CL_PROFILING_COMMAND_SUBMIT on very small kernel may be a way to measure the latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants