Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oneprof -q fails with error "ZE_RESULT_SUCCESS' failed" #40

Open
mumar-intel opened this issue May 25, 2023 · 2 comments
Open

oneprof -q fails with error "ZE_RESULT_SUCCESS' failed" #40

mumar-intel opened this issue May 25, 2023 · 2 comments

Comments

@mumar-intel
Copy link

I am using oneprof on one HPC+AI application with large number of kernels (~30). When I run:
oneprof -q -o test.txt $APP_EXE
It fails with error:
oneprof/metric_query_collector.h:307: void MetricQueryCollector::ProcessQuery(const ZeQueryInfo&): Assertion `status == ZE_RESULT_SUCCESS' failed

It generates the output files (result.* data,* and test.txt) but the test.txt contains just the application total runtime and provides no information about the individual kernels.

I have tested it one tile, and one GPU. The application does not use MPI, it is a Python based code.

@jfedorov
Copy link
Contributor

jfedorov commented Dec 5, 2023

@mumar-intel sorry for responding in such a delay.
recently there were several fixes in oneprof. Can you please try the collection with the recent oneprof and tell if it still reproduced? thank you.

jfedorov pushed a commit that referenced this issue Dec 15, 2023
* adding kernel handlng on synchronize fence

* adding kernel handling on synchronize fence
@Wanzizhu
Copy link

hi, @jfedorov , i also run into this issue, and i updated to latest commit(9ee0e46),
below is the error info, is it expected?

 pti-gpu/tools/oneprof/metric_query_cache.h:69: _zet_metric_query_handle_t* MetricQueryCache::GetQ
uery(ze_context_handle_t): Assertion `status == ZE_RESULT_SUCCESS' failed.

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: spr [Genuine Intel(R) CPU 0000%@]
Registry and code: 13 MB
Command: python test_linear.py
Uptime: 7.938176 s
Aborted (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants