Skip to content

Commit

Permalink
Summarizing device timing regardless of kernel shapes by default (#37)
Browse files Browse the repository at this point in the history
* Initial version of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Initial commits of unitrace

* Unhide Symbols Required By XPTI

* Initial commits of unitrace

* Initial commits of unitrace

* Summarizing device timing regardless of kernel shapes by default

* Summarizing device timing with out kernel shapes by default

* Summarizing device timing with out kernel shapes by default

* Summarizing device timing with out kernel shapes by default

---------

Co-authored-by: Schilling, Matthew <matthew.schilling@intel.com>
  • Loading branch information
zma2 and mschilling0 authored Dec 11, 2023
1 parent 93f66e7 commit 6a85c1d
Show file tree
Hide file tree
Showing 8 changed files with 46 additions and 67 deletions.
12 changes: 10 additions & 2 deletions tools/unitrace/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,18 +131,26 @@ To trace/profile device and kernel activities, one can use one or more of the fo

The **--device-timing [-d]** option outputs a timing summary of kernels and commands executed on the device:

![Device Timing!](/tools/unitrace/doc/images/device-timing.png)

![Device Timing With No Shape!](/tools/unitrace/doc/images/device-timing-with-no-shape.png)

In addition, it also outputs kernel information that helps to identify kernel performance issues that relate to occupancy caused by shared local memory usage and register spilling.

![Kernel Info!](/tools/unitrace/doc/images/kernel-info.png)
![Kernel Info With No Shape!](/tools/unitrace/doc/images/kernel-info-with-no-shape.png)

Here, the **"SLM Per Work Group"** shows the amount of shared local memory needed for each work group in bytes. This size can potentially affect occupancy.

The **"Private Memory Per Thread"** is the private memory allocated for each thread in bytes. A non-zero value indicates that one or more thread private variables are not in registers.

The **"Spill Memory Per Thread"** is the memory used for register spilled for each thread in bytes. A non-zero value indicates that one or more thread private variables are allocated in registers but are later spilled to memory.

By default, the kernel timing is summarized regardless of shapes. In case the kernel has different shapes, using **-v** along with **-d** is strongly recommended:

![Device Timing!](/tools/unitrace/doc/images/device-timing.png)

![Kernel Info!](/tools/unitrace/doc/images/kernel-info.png)


The **--kernel-submission [-s]** option outputs a time summary of kernels spent in queuing, submission and execution:
![Kernel Submissions!](/tools/unitrace/doc/images/kernel-submissions.png)

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion tools/unitrace/src/chromelogger.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ static uint32_t mpi_rank = std::atoi(rank.c_str());
static std::string process_start_time = std::to_string(UniTimer::GetEpochTimeInUs(UniTimer::GetHostTimestamp()));
static std::string pmi_hostname = GetHostName();

std::string GetZeKernelCommandName(uint64_t id, ze_group_count_t& group_count, size_t size);
std::string GetZeKernelCommandName(uint64_t id, ze_group_count_t& group_count, size_t size, bool detailed);
ze_pci_ext_properties_t *GetZeDevicePciPropertiesAndId(ze_device_handle_t device, int32_t *parent_device_id, int32_t *device_id, int32_t *subdevice_id);

static Logger* logger_ = nullptr;
Expand Down
81 changes: 26 additions & 55 deletions tools/unitrace/src/levelzero/ze_collector.h
Original file line number Diff line number Diff line change
Expand Up @@ -942,41 +942,43 @@ typedef void (*OnZeKernelFinishCallback)(uint64_t kid, uint64_t tid, uint64_t st

ze_result_t (*zexKernelGetBaseAddress)(ze_kernel_handle_t hKernel, uint64_t *baseAddress) = nullptr;

inline std::string GetZeKernelCommandName(uint64_t id, const ze_group_count_t& group_count, size_t size) {
inline std::string GetZeKernelCommandName(uint64_t id, const ze_group_count_t& group_count, size_t size, bool detailed = true) {
std::stringstream s;
kernel_command_properties_mutex_.lock_shared();
auto it = kernel_command_properties_->find(id);
if (it != kernel_command_properties_->end()) {
s << utils::Demangle(it->second.name_.c_str());
if (it->second.type_ == KERNEL_COMMAND_TYPE_COMPUTE) {
if (it->second.simd_width_ > 0) {
s << "[SIMD";
if (it->second.simd_width_ == 1) {
s << "_ANY";
} else {
s << it->second.simd_width_;
if (detailed) {
if (it->second.type_ == KERNEL_COMMAND_TYPE_COMPUTE) {
if (it->second.simd_width_ > 0) {
s << "[SIMD";
if (it->second.simd_width_ == 1) {
s << "_ANY";
} else {
s << it->second.simd_width_;
}
}
s << " {" <<
group_count.groupCountX << "; " <<
group_count.groupCountY << "; " <<
group_count.groupCountZ << "} {" <<
it->second.group_size_.x << "; " <<
it->second.group_size_.y << "; " <<
it->second.group_size_.z << "}]";
}
else if ((it->second.type_ == KERNEL_COMMAND_TYPE_MEMORY) && (size > 0)) {
s << "[" << size << "]";
}
s << " {" <<
group_count.groupCountX << "; " <<
group_count.groupCountY << "; " <<
group_count.groupCountZ << "} {" <<
it->second.group_size_.x << "; " <<
it->second.group_size_.y << "; " <<
it->second.group_size_.z << "}]";
}
else if ((it->second.type_ == KERNEL_COMMAND_TYPE_MEMORY) && (size > 0)) {
s << "[" << size << "]";
}
}

kernel_command_properties_mutex_.unlock_shared();
return s.str();
}

inline std::string GetZeKernelCommandName(uint64_t id, ze_group_count_t& group_count, size_t size) {
inline std::string GetZeKernelCommandName(uint64_t id, ze_group_count_t& group_count, size_t size, bool detailed = true) {
const ze_group_count_t& gcount = group_count;
return GetZeKernelCommandName(id, gcount, size);
return GetZeKernelCommandName(id, gcount, size, detailed);
}

inline ze_pci_ext_properties_t *GetZeDevicePciPropertiesAndId(ze_device_handle_t device, int32_t *parent_device_id, int32_t *device_id, int32_t *subdevice_id){
Expand Down Expand Up @@ -1115,10 +1117,10 @@ class ZeCollector {
total_time += it.second.execute_time_;
std::string kname;
if (it.first.tile_ >= 0) {
kname = "Tile #" + std::to_string(it.first.tile_) + ": " + GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_);
kname = "Tile #" + std::to_string(it.first.tile_) + ": " + GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_, options_.verbose);
}
else {
kname = GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_);
kname = GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_, options_.verbose);
}
if (kname.size() > max_name_size) {
max_name_size = kname.size();
Expand Down Expand Up @@ -1204,10 +1206,10 @@ class ZeCollector {
total_submit_time += it.second.submit_time_;
std::string kname;
if (it.first.tile_ >= 0) {
kname = "Tile #" + std::to_string(it.first.tile_) + ": " + GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_);
kname = "Tile #" + std::to_string(it.first.tile_) + ": " + GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_, options_.verbose);
}
else {
kname = GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_);
kname = GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_, options_.verbose);
}
if (kname.size() > max_name_size) {
max_name_size = kname.size();
Expand Down Expand Up @@ -1615,37 +1617,6 @@ class ZeCollector {

sub_desc.driver_ = driver;
sub_desc.context_ = context;
#if 0
if (options_.metric_query) {
zet_metric_group_handle_t group = nullptr;
uint32_t num_groups = 0;
status = zetMetricGroupGet(sub_devices[j], &num_groups, nullptr);
PTI_ASSERT(status == ZE_RESULT_SUCCESS);
if (num_groups > 0) {
std::vector<zet_metric_group_handle_t> groups(num_groups, nullptr);
status = zetMetricGroupGet(sub_devices[j], &num_groups, groups.data());
PTI_ASSERT(status == ZE_RESULT_SUCCESS);

for (uint32_t k = 0; k < num_groups; ++k) {
zet_metric_group_properties_t group_props{};
group_props.stype = ZET_STRUCTURE_TYPE_METRIC_GROUP_PROPERTIES;
status = zetMetricGroupGetProperties(groups[k], &group_props);
PTI_ASSERT(status == ZE_RESULT_SUCCESS);


if ((strcmp(group_props.name, utils::GetEnv("UNITRACE_MetricGroup").c_str()) == 0) && (group_props.samplingType & ZET_METRIC_GROUP_SAMPLING_TYPE_FLAG_EVENT_BASED)) {
group = groups[k];
break;
}
}
}
status = zetContextActivateMetricGroups(context, sub_devices[j], 1, &group);
PTI_ASSERT(status == ZE_RESULT_SUCCESS);
metric_activations_.insert({context, sub_devices[j]});

sub_desc.metric_group_ = group;
}
#endif /* 0 */

sub_desc.metric_group_ = nullptr;

Expand Down
4 changes: 2 additions & 2 deletions tools/unitrace/src/tracer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ static TraceOptions ReadArgs() {
}

std::string get_version() {
return std::string(VERSION) + " ("+ std::string(COMMIT_HASH) + ")";
return std::string(UNITRACE_VERSION) + " ("+ std::string(COMMIT_HASH) + ")";
}

void __attribute__((constructor)) Init(void) {
Expand All @@ -168,7 +168,7 @@ void __attribute__((constructor)) Init(void) {
if (unitrace_version.size() > 0) {
auto libunitrace_version = get_version();
if (unitrace_version.compare(libunitrace_version) != 0) {
std::cerr << "[ERROR] Versions of Unitrace and libUnitrace_tool.so do not match." << std::endl;
std::cerr << "[ERROR] Versions of unitrace and libunitrace_tool.so do not match." << std::endl;
exit(-1);
}
}
Expand Down
12 changes: 6 additions & 6 deletions tools/unitrace/src/unitrace.cc
Original file line number Diff line number Diff line change
Expand Up @@ -119,8 +119,8 @@ void Usage(char * progname) {
std::endl;
std::cout <<
"--verbose [-v] " <<
"Enable verbose mode to show more kernel information. For OpenCL backend only." << std::endl <<
" Verbose is always enabled for Level Zero backend" <<
"Enable verbose mode to show kernel shapes" << std::endl <<
" Kernel shapes are always enabled in timelines for Level Zero backend" <<
std::endl;
std::cout <<
"--demangle " <<
Expand Down Expand Up @@ -384,7 +384,7 @@ int ParseArgs(int argc, char* argv[]) {
show_metric_list = true;
++app_index;
} else if (strcmp(argv[i], "--version") == 0) {
std::cout << VERSION << " (" << COMMIT_HASH << ")" << std::endl;
std::cout << UNITRACE_VERSION << " (" << COMMIT_HASH << ")" << std::endl;
return 0;
} else {
break;
Expand Down Expand Up @@ -555,7 +555,7 @@ int main(int argc, char *argv[]) {
#endif

// Set unitrace version
auto unitrace_version = std::string(VERSION) + " (" + std::string(COMMIT_HASH) + ")";
auto unitrace_version = std::string(UNITRACE_VERSION) + " (" + std::string(COMMIT_HASH) + ")";
utils::SetEnv("UNITRACE_VERSION", unitrace_version.c_str());

SetProfilingEnvironment();
Expand Down Expand Up @@ -589,11 +589,11 @@ int main(int argc, char *argv[]) {
if (utils::GetEnv("UNITRACE_ChromeMpiLogging") == "1") {
preload = preload + ":" + mpi_interceptor_path;
// For tracing MPI calls from oneCCL, we need to set CCL_MPI_LIBRARY_PATH
// with Unitrace's MPI intercepter path, because oneCCL directly picks up
// with unitrace's MPI intercepter path, because oneCCL directly picks up
// MPI functions with dlopen/dlsym, not through the dynamic linker. Thus,
// LD_PRELOAD would not work.
// TODO: We have to consider a case where CCL_MPI_LIBRARY_PATH is already
// set. Unitrace will need to call the MPIs in the specified libs
// set. In this case, unitrace needs to call MPIs in the specified libs
// before/after ITT annotation.
utils::SetEnv("CCL_MPI_LIBRARY_PATH", mpi_interceptor_path.c_str());
}
Expand Down
2 changes: 1 addition & 1 deletion tools/unitrace/src/version.h
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#ifndef PTI_TOOLS_UNITRACE_VERSION_H_
#define PTI_TOOLS_UNITRACE_VERSION_H_

#define VERSION "2.0.0"
#define UNITRACE_VERSION "2.0.1"

std::string get_version();

Expand Down

0 comments on commit 6a85c1d

Please sign in to comment.