Skip to content

Commit

Permalink
Fix Torch tensor locality with autoray-registered coerce method (#5438
Browse files Browse the repository at this point in the history
)

### Before submitting

Please complete the following checklist when submitting a PR:

- [x] All new features must include a unit test.
If you've fixed a bug or added code that should be tested, add a test to
the
      test directory!

- [x] All new functions and code must be clearly commented and
documented.
If you do make documentation changes, make sure that the docs build and
      render correctly by running `make docs`.

- [x] Ensure that the test suite passes, by running `make test`.

- [x] Add a new entry to the `doc/releases/changelog-dev.md` file,
summarizing the
      change, and including a link back to the PR.

- [x] The PennyLane source code conforms to
      [PEP8 standards](https://www.python.org/dev/peps/pep-0008/).
We check all of our code against [Pylint](https://www.pylint.org/).
      To lint modified files, simply `pip install pylint`, and then
      run `pylint pennylane/path/to/file.py`.

When all the above are checked, delete everything above the dashed
line and fill in the pull request template.


------------------------------------------------------------------------------------------------------------

**Context:** When Torch has a GPU backed data-buffer, failures can occur
when attempting to make autoray-dispatched calls to Torch method with
paired CPU data. In this case, for probabilities on the GPU, and
eigenvalues on the host (read from the observables), failures appeared
with `qml.dot`, and can be reproduced from:

```python
import pennylane as qml
import torch
import numpy as np

torch_device="cuda"
dev = qml.device("default.qubit.torch", wires=2, torch_device=torch_device)
ham = qml.Hamiltonian(torch.tensor([0.1, 0.2], requires_grad=True), [qml.PauliX(0), qml.PauliZ(1)])

@qml.qnode(dev, diff_method="backprop", interface="torch")
def circuit():
    qml.RX(np.zeros(5), 0)  # Broadcast the state by applying a broadcasted identity
    return qml.expval(ham)

res = circuit()
assert qml.math.allclose(res, 0.2)
```

This pair modifies the registered `coerce` method for Torch to always
automigrate mixed CPU-GPU data to always favour the associated GPU. In
addition, this method now also catches multi-GPU data, where tensors do
not reside on the same index, and will fail outright. As a longer term
solution, moving the Torch GPU dispatch calls to earlier in the stack
would be more sound, but this fixes the aforementioned issue, at the
expense of always migrating from CPU to GPU.

**Description of the Change:** As above.

**Benefits:** Allows automatic data migration from host to device when
using a GPU backed tensor. In addition, will catch multi-GPU tensor data
when using Torch, and fail due to non-local representations.

**Possible Drawbacks:** Auto migration may not always be wanted. The
alternative solution is to always be explicit about locality, and move
the eigenvalue data to exist on the device at a higher layer in the
stack.

**Related GitHub Issues:** #5269 introduced changes that resulted in GPU
errors.
  • Loading branch information
mlxd authored Mar 27, 2024
1 parent 30f69b0 commit 1bb10be
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 9 deletions.
3 changes: 3 additions & 0 deletions doc/releases/changelog-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,9 @@

<h3>Bug fixes 🐛</h3>

* Fix Torch tensor locality with autoray-registered coerce method.
[(#5438)](https://github.com/PennyLaneAI/pennylane/pull/5438)

* `jax.jit` now works with `qml.sample` with a multi-wire observable.
[(#5422)](https://github.com/PennyLaneAI/pennylane/pull/5422)

Expand Down
1 change: 1 addition & 0 deletions pennylane/math/multi_dispatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,7 @@ def dot(tensor1, tensor2, like=None):
x, y = np.coerce([tensor1, tensor2], like=like)

if like == "torch":

if x.ndim == 0 and y.ndim == 0:
return x * y

Expand Down
36 changes: 27 additions & 9 deletions pennylane/math/single_dispatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -599,16 +599,34 @@ def _coerce_types_torch(tensors):
torch = _i("torch")

# Extract existing set devices, if any
device_set = set(t.device for t in tensors if isinstance(t, torch.Tensor))
if len(device_set) > 1: # pragma: no cover
# GPU specific case
device_names = ", ".join(str(d) for d in device_set)
raise RuntimeError(
f"Expected all tensors to be on the same device, but found at least two devices, {device_names}!"
)
device_set = set()
dev_indices = set()
for t in tensors:
if isinstance(t, torch.Tensor):
device_set.add(t.device.type)
dev_indices.add(t.device.index)
else:
device_set.add("cpu")
dev_indices.add(None)

device = device_set.pop() if len(device_set) == 1 else None
tensors = [torch.as_tensor(t, device=device) for t in tensors]
if len(device_set) > 1: # pragma: no cover
# If data exists on two separate GPUs, outright fail
if len([i for i in dev_indices if i is not None]) > 1:
device_names = ", ".join(str(d) for d in device_set)

raise RuntimeError(
f"Expected all tensors to be on the same device, but found at least two devices, {device_names}!"
)
# Otherwise, automigrate data from CPU to GPU and carry on.
dev_indices.remove(None)
dev_id = dev_indices.pop()
tensors = [
torch.as_tensor(t, device=torch.device(f"cuda:{dev_id}"))
for t in tensors # pragma: no cover
]
else:
device = device_set.pop()
tensors = [torch.as_tensor(t, device=device) for t in tensors]

dtypes = {i.dtype for i in tensors}

Expand Down

0 comments on commit 1bb10be

Please sign in to comment.