Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Python driver/device creation. #9330

Merged
merged 4 commits into from
Jul 29, 2022

Conversation

stellaraccident
Copy link
Collaborator

@stellaraccident stellaraccident commented Jun 4, 2022

APIs removed:

  • HalDriver.create() (use iree.runtime.get_driver(driver_name) to get a
    cached instance).
  • Environment variable IREE_DEFAULT_DRIVER renamed to
    IREE_DEFAULT_DEVICE to better reflect the new syntax.
  • Config.driver attribute (no longer captured by this class)

APIs added:

  • iree.runtime.query_available_drivers() (alias of HalDriver.query())
  • iree.runtime.get_driver(device_uri)
  • iree.runtime.get_device(device_uri)
  • iree.runtime.get_first_device(device_uris)
  • iree.runtime.Config(, device: HalDevice) (to configure with an
    explicit device)
  • HalDriver.create_device(device_id: Union[int, tuple])
  • HalDriver.query_available_devices()
  • HalDriver.create_device_by_uri(device_uri: str)

Both driver and device lookup is done by a device URI, as defined by the runtime (when creating a driver, only the 'scheme' is used). Driver instances are cached by name in the native code, which should avoid various bad behavior in terms of driver lifetimes and lack of care to process state. Devices are optionally (default True) cached at the Python level.

Fixes #9277
Expected to fix #9936

@ScottTodd ScottTodd added runtime Relating to the IREE runtime library hal/vulkan Runtime Vulkan GPU HAL backend bindings/python Python wrapping IREE's C API labels Jun 6, 2022
Copy link
Collaborator

@benvanik benvanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me noodle on this for a minute; my goal this week is to fix up driver/device stuff, and in particular the issue today is that the way the drivers are setup is that it's extremely expensive to create them (involves loading vulkan/cuda/creating thread pools/etc). As a result the query devices logic is not cheap - the cost of querying whether there's a vulkan device is nearly as expensive as creating the device, and we want to be positive it's never on a code path beyond a --list-devices flag. Loading cuda/vulkan/moltenvk/metal/etc into a process is quite destructive.

Basically, a user needs to treat drivers and devices as a tiered step and not as a flattened space: so iree.runtime.get_device_by_name is no good unless there was some spooky global iree.runtime.make_devices_available(driver_name) or something that populated the list. I think it'd be better to just have them be a tier (list drivers, then get a driver and list devices in that driver). It's too big of a footgun to have a "get me a device from some driver" API unless it's explicitly "get me any device from a particular driver" as that doesn't require creating and querying first.

The other big thing I need to fix up related to this is that vmvx/dylib are not drivers - they are just feature capabilities of a single driver (local w/ threading, and vmvx-sync/dylib-sync are local w/o threading). I'm still trying to figure out how to invert that properly and select it. In non-test cases things are easy (choose whether you want dylib/vmvx with iree-compile flags and then at runtime choose whether you want threaded/non-threaded) but in a module compiled with multiple formats it's useful to be able to force one or the other - that may need an additional configuration mechanism (+vmvx,+embedded-elf-x64-avx512 etc).

I had some notes from awhile back and will see if I can find them and then start a sketch or two.

runtime/bindings/python/hal.cc Show resolved Hide resolved
@stellaraccident
Copy link
Collaborator Author

Let me noodle on this for a minute; my goal this week is to fix up driver/device stuff, and in particular the issue today is that the way the drivers are setup is that it's extremely expensive to create them (involves loading vulkan/cuda/creating thread pools/etc). As a result the query devices logic is not cheap - the cost of querying whether there's a vulkan device is nearly as expensive as creating the device, and we want to be positive it's never on a code path beyond a --list-devices flag. Loading cuda/vulkan/moltenvk/metal/etc into a process is quite destructive.

Basically, a user needs to treat drivers and devices as a tiered step and not as a flattened space: so iree.runtime.get_device_by_name is no good unless there was some spooky global iree.runtime.make_devices_available(driver_name) or something that populated the list. I think it'd be better to just have them be a tier (list drivers, then get a driver and list devices in that driver). It's too big of a footgun to have a "get me a device from some driver" API unless it's explicitly "get me any device from a particular driver" as that doesn't require creating and querying first.

The other big thing I need to fix up related to this is that vmvx/dylib are not drivers - they are just feature capabilities of a single driver (local w/ threading, and vmvx-sync/dylib-sync are local w/o threading). I'm still trying to figure out how to invert that properly and select it. In non-test cases things are easy (choose whether you want dylib/vmvx with iree-compile flags and then at runtime choose whether you want threaded/non-threaded) but in a module compiled with multiple formats it's useful to be able to force one or the other - that may need an additional configuration mechanism (+vmvx,+embedded-elf-x64-avx512 etc).

I had some notes from awhile back and will see if I can find them and then start a sketch or two.

I'd be happy if the lower level API evolved as you say. Let me know. I can sit on this patch for a little while, but the features here are reasonably important. Lmk if you end up with any C API sketches to adapt this towards.

@benvanik
Copy link
Collaborator

benvanik commented Jun 6, 2022

👍 my goal is to get to #9343 this week, so I don't think it needs to sit long, and once I have a draft of that C API we can chat about how it lines up - I think it's not too disruptive to what you have here

@benvanik
Copy link
Collaborator

Ok I think the bulk of the changes landed; #9443 has what all the native tools now do with example output in https://gist.github.com/benvanik/059c5773068b114ea393bf5b95d791c2.

The new device URI stuff should replace the need for duplicated python string manipulation and the goal is that they are opaque from the perspective of API layers - you can pass in cuda or cuda:0 or cuda://GPU-MASSIVE-UUID-FROM-NVIDIA-SMI etc and the devices will be created by the drivers as appropriate. Each driver will support its own representation and the only ones that can be crafted by framework layers are the ones returned from enumeration. iree_hal_device_info_t now has a path field and you can form {driver}://{path}, but whether {driver}://{ordinal} works is implementation dependent and not something we want to surface implicitly - having a "iree.runtime.cuda.make_devices(0..4)" that knew that it made sense for the cuda driver is fine, but passing an ordinal to the local-sync driver (or remote-tcp etc) won't work.

The APIs didn't change much but the driver registry/driver interface grew some new functions and helpers. iree_hal_create_device for one-shots and iree_hal_driver_create_device_by_path/iree_hal_driver_create_device_by_uri are probably the most relevant. There's a caveat on iree_hal_create_device that you're going to have a bad time if you create multiple devices without reusing the same driver - e.g. it creates a cuda driver, creates a device from it, and releases the driver and you're not going to be able to create another cuda device (safely). If the python layer wants to be able to create multiple devices or repeatedly create devices inside of a test loop/etc it'll need to cache the drivers. Unfortunate, but someone with scenario knowledge has to hold on to those and the binding layer is the lowest level where it's possible to know that stuff. Maybe some pybind weak reference keep-alive stuff could help.

There's some new flag helpers under iree/tooling/device_utils.c that you could probably reference; nothing very complex but does show off the API usage. In particular there's a new driver API for dumping detailed device information that may be useful to surface - it just writes to a string builder so you can return it as a python string when queried/etc, though it's TBD to add actual output to it - that's where we could put nvidia-smi/vulkaninfo like stuff for diagnostics.

Everything is setup now for multiple devices, though I need to work on #5724 to get the right APIs in-place for passing that down to the HAL. For now anywhere you'd have one device (like Config constructor) taking in a list of devices would be better even if for now it just asserts there's only 1 in the list - that'll make it easier for me to come in and plumb that through.

I did the bare minimum to get things rearranged and working for command line tools/C API - happy to add more/expose things/etc that make python better - just LMK!

stellaraccident added a commit to stellaraccident/iree that referenced this pull request Jun 25, 2022
…pipe hidden.

This updates enumeration and selection by index to respect whether a device is hidden, based on some characteristics. Since we've had continuous problems with lavapipe as a compute device (and since, when accidentally using since it is often installed by default, it spews stderr warnings about only being for testing), I opted to make this the first heuristic for hiding a device.

Broken out of iree-org#9330.
stellaraccident added a commit that referenced this pull request Jun 25, 2022
…pipe hidden. (#9621)

This updates enumeration and selection by index to respect whether a device is hidden, based on some characteristics. Since we've had continuous problems with lavapipe as a compute device (and since, when accidentally using since it is often installed by default, it spews stderr warnings about only being for testing), I opted to make this the first heuristic for hiding a device.

Broken out of #9330.
APIs removed:
  * HalDriver.create() (use iree.runtime.get_driver(driver_name) to get a
    cached instance).
  * Environment variable IREE_DEFAULT_DRIVER renamed to
    IREE_DEFAULT_DEVICE to better reflect the new syntax.
  * Config.driver attribute (no longer captured by this class)

APIs added:
  * iree.runtime.query_available_drivers() (alias of HalDriver.query())
  * iree.runtime.get_driver(driver_name)
  * iree.runtime.get_device_by_name(name_spec)
  * iree.runtime.get_first_device_by_name(name_specs)
  * iree.runtime.Config(, device: HalDevice) (to configure with an
    explicit device)
  * HalDriver.create_device(device_id: Union[int, tuple])
  * HalDriver.query_available_devices()

Devices can now be queried/constructed explicitly using
HalDriver.query_available_devices() and passing found device_ids to
HalDriver.create_device. Default configuration is extended to take "name
specs" instead of driver name. This can either be a raw driver name
(i.e. "vmvx") but can also be driver:index (i.e. "vulkan:3"). Some
logging is added to make it clearer what was selected. Devices created
in this way are cached now, since this facility is meant to be used for
trivial/default configuration. If explicitly creating devices, the user
is on their own to cache as desired.

Fixes iree-org#9277
runtime/bindings/python/hal.cc Outdated Show resolved Hide resolved
runtime/bindings/python/iree/runtime/system_setup.py Outdated Show resolved Hide resolved
@stellaraccident stellaraccident enabled auto-merge (squash) July 29, 2022 03:31
@stellaraccident stellaraccident merged commit 3813758 into iree-org:main Jul 29, 2022
@stellaraccident stellaraccident deleted the pydeviceinit branch July 29, 2022 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bindings/python Python wrapping IREE's C API hal/vulkan Runtime Vulkan GPU HAL backend runtime Relating to the IREE runtime library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TF integrations Python + Vulkan failures Segfault when creating multiple IREE SystemContexts
3 participants