-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing --driver=
with --device=
and adding list flags.
#9443
Conversation
These allow for querying of registered drivers, all devices or devices for a particular driver, and detailed per-device information provided by the driver implementations. The intent is that drivers can dump relevant information like supported device features, limits, etc ala something like nvidia-smi or vulkaninfo. CUDA and Vulkan drivers have gained the ability to parse some device paths. CUDA can now either take the UUID/MIG or device ordinal and Vulkan currently can only take a device ordinal. With the URI scheme this means these all work to reference the same device: ``` --device=cuda://GPU-754d9ae2-8df5-f8e3-3502-182434a12876 --device=cuda://0 --device=cuda:0 ``` (the UUID and ordinal are the same as printed by `nvidia-smi -L`) Drivers can have arbitrarily complex paths and we can continue to add support over time. The new listing commands act as the source of truth for what devices are supported and how to reference them. Progress on #9343.
This is a superset of the existing driver flag and now allows for specifying the full device URI and multiple devices (even if no tool currently can use them). Progress on #9343.
ca89e6a
to
bba52ae
Compare
Abbreviated x86_64 Benchmark Summary (experimental)@ commit bba52aeb5d5ac552a017a1ef13a3e16a0cd7e121 (vs. base 33a7caaddbd8e26dee20e0df6cf7b338789581ad) Improved Benchmarks 🎉
For more information: |
"--driver=%s" % driver, | ||
"--device=%s" % driver, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also update the arg name from driver
to device
, including comment updates
driver: driver to run the module with. This can be omitted to test only
compilation, but consider omiting the driver as a hacky abuse of the
rule since compilation on its own not use iree-check-module.
looks like a few files under build_tools/
need similar updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this use of driver is fine as it's specifying the driver and not a device - it maps to the driver= test filter flag and which driver module to link in, which we then use to derive the --device flag. There's still probably a few hiding in here but I tried not to touch anything that ended up in driver-related stuff as going driver->device is safe but device->driver isn't (don't want people to set a benchmark suite rule for a particular device path and then have to reverse engineer the driver from that with cmake string manipulation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a little confusing IMO, but fine to proceed and continue cleaning up later.
This gives us consistent behavior across the various command line tools and prepares for multiple devices and device sets (#5724).
Adds the following flags:
--list_drivers
: list all available drivers compiled into the tool--list_devices
: list all devices from all drivers--list_devices={driver}
: list all devices from a single driver--dump_devices
: dump detailed information about all devices from all drivers--dump_devices={driver}
: dump detailed information about all devices from a single driver--device={uri}
: specify one or more devices to use (today only 1 is used)Example usage: https://gist.github.com/benvanik/059c5773068b114ea393bf5b95d791c2
Currently none of the HAL drivers are putting anything interesting in their dump output but we can iterate on what we put there (not intended as a full vulkaninfo/nvidia-smi replacement, but showing relevant information to our usage).
Most of the implementation is hidden in
device_utils.c
but if we ever want programmatic access to list/dump we can add an API for them.NOTE: we are moving towards multiple devices: there are several bits of infra that are always assuming single devices and those will not be compatible with multi-device usage.
Progress on #9343.