Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing --driver= with --device= and adding list flags. #9443

Merged
merged 4 commits into from
Jun 10, 2022

Conversation

benvanik
Copy link
Collaborator

@benvanik benvanik commented Jun 10, 2022

This gives us consistent behavior across the various command line tools and prepares for multiple devices and device sets (#5724).

Adds the following flags:

  • --list_drivers: list all available drivers compiled into the tool
  • --list_devices: list all devices from all drivers
  • --list_devices={driver}: list all devices from a single driver
  • --dump_devices: dump detailed information about all devices from all drivers
  • --dump_devices={driver}: dump detailed information about all devices from a single driver
  • --device={uri}: specify one or more devices to use (today only 1 is used)

Example usage: https://gist.github.com/benvanik/059c5773068b114ea393bf5b95d791c2
Currently none of the HAL drivers are putting anything interesting in their dump output but we can iterate on what we put there (not intended as a full vulkaninfo/nvidia-smi replacement, but showing relevant information to our usage).

Most of the implementation is hidden in device_utils.c but if we ever want programmatic access to list/dump we can add an API for them.

NOTE: we are moving towards multiple devices: there are several bits of infra that are always assuming single devices and those will not be compatible with multi-device usage.

Progress on #9343.

@benvanik benvanik added runtime/tools IREE's runtime tooling (iree-run-module, iree-benchmark-module, etc) quality of life 😊 Nice things are nice; let's have some cleanup 🧹 labels Jun 10, 2022
These allow for querying of registered drivers, all devices or devices
for a particular driver, and detailed per-device information provided
by the driver implementations. The intent is that drivers can dump
relevant information like supported device features, limits, etc ala
something like nvidia-smi or vulkaninfo.

CUDA and Vulkan drivers have gained the ability to parse some device
paths. CUDA can now either take the UUID/MIG or device ordinal and Vulkan
currently can only take a device ordinal. With the URI scheme this means
these all work to reference the same device:
```
--device=cuda://GPU-754d9ae2-8df5-f8e3-3502-182434a12876
--device=cuda://0
--device=cuda:0
```
(the UUID and ordinal are the same as printed by `nvidia-smi -L`)

Drivers can have arbitrarily complex paths and we can continue to add
support over time. The new listing commands act as the source of truth
for what devices are supported and how to reference them.

Progress on #9343.
This is a superset of the existing driver flag and now allows for
specifying the full device URI and multiple devices (even if no tool
currently can use them).

Progress on #9343.
@iree-github-actions-bot
Copy link
Contributor

Abbreviated x86_64 Benchmark Summary (experimental)

@ commit bba52aeb5d5ac552a017a1ef13a3e16a0cd7e121 (vs. base 33a7caaddbd8e26dee20e0df6cf7b338789581ad)

Improved Benchmarks 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV3Small [fp32,imagenet] (TFLite) full-inference,default-flags with IREE-Dylib-Sync @ GCP-c2-standard-16 (CPU-x86\_64-CascadeLake) 4 (vs. 5, 20.00%↓) 4 0

For more information:

"--driver=%s" % driver,
"--device=%s" % driver,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also update the arg name from driver to device, including comment updates

      driver: driver to run the module with. This can be omitted to test only
          compilation, but consider omiting the driver as a hacky abuse of the
          rule since compilation on its own not use iree-check-module.

looks like a few files under build_tools/ need similar updates

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this use of driver is fine as it's specifying the driver and not a device - it maps to the driver= test filter flag and which driver module to link in, which we then use to derive the --device flag. There's still probably a few hiding in here but I tried not to touch anything that ended up in driver-related stuff as going driver->device is safe but device->driver isn't (don't want people to set a benchmark suite rule for a particular device path and then have to reverse engineer the driver from that with cmake string manipulation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a little confusing IMO, but fine to proceed and continue cleaning up later.

@benvanik benvanik requested a review from ScottTodd June 10, 2022 19:09
@iree-github-actions-bot
Copy link
Contributor

Abbreviated Benchmark Summary

@ commit bba52aeb5d5ac552a017a1ef13a3e16a0cd7e121 (vs. base 33a7caaddbd8e26dee20e0df6cf7b338789581ad)

Regressed Benchmarks 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad [fp32] (TFLite) 4-thread,big-core,full-inference,experimental-flags with IREE-Dylib @ Pixel-6-Pro (CPU-ARMv8.2-A) 478 (vs. 411, 16.30%↑) 477 5
MobileBertSquad [fp16] (TFLite) full-inference,experimental-flags with IREE-Vulkan @ Pixel-6-Pro (GPU-Mali-G78) 145 (vs. 133, 9.02%↑) 144 8
DeepLabV3 [fp32] (TFLite) little-core,full-inference,experimental-flags with IREE-Dylib-Sync @ Pixel-6-Pro (CPU-ARMv8.2-A) 372 (vs. 348, 6.90%↑) 373 4

[Top 3 out of 5 benchmark results showed]

Improved Benchmarks 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad [int8] (TFLite) 4-thread,big-core,full-inference,experimental-flags with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A) 201 (vs. 243, 17.28%↓) 201 0
MobileNetV3Small [fp32,imagenet] (TFLite) 1-thread,big-core,full-inference,default-flags with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A) 11 (vs. 12, 8.33%↓) 11 0

For more information:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup 🧹 (deprecated) buildkite:benchmark-android Deprecated. Please use benchmarks:android-* quality of life 😊 Nice things are nice; let's have some runtime/tools IREE's runtime tooling (iree-run-module, iree-benchmark-module, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants