Fix Torch tensor locality with autoray-registered coerce method (#5438

) ### Before submitting Please complete the following checklist when submitting a PR: - [x] All new features must include a unit test. If you've fixed a bug or added code that should be tested, add a test to the test directory! - [x] All new functions and code must be clearly commented and documented. If you do make documentation changes, make sure that the docs build and render correctly by running `make docs`. - [x] Ensure that the test suite passes, by running `make test`. - [x] Add a new entry to the `doc/releases/changelog-dev.md` file, summarizing the change, and including a link back to the PR. - [x] The PennyLane source code conforms to [PEP8 standards](https://www.python.org/dev/peps/pep-0008/). We check all of our code against [Pylint](https://www.pylint.org/). To lint modified files, simply `pip install pylint`, and then run `pylint pennylane/path/to/file.py`. When all the above are checked, delete everything above the dashed line and fill in the pull request template. ------------------------------------------------------------------------------------------------------------ **Context:** When Torch has a GPU backed data-buffer, failures can occur when attempting to make autoray-dispatched calls to Torch method with paired CPU data. In this case, for probabilities on the GPU, and eigenvalues on the host (read from the observables), failures appeared with `qml.dot`, and can be reproduced from: ```python import pennylane as qml import torch import numpy as np torch_device="cuda" dev = qml.device("default.qubit.torch", wires=2, torch_device=torch_device) ham = qml.Hamiltonian(torch.tensor([0.1, 0.2], requires_grad=True), [qml.PauliX(0), qml.PauliZ(1)]) @qml.qnode(dev, diff_method="backprop", interface="torch") def circuit(): qml.RX(np.zeros(5), 0) # Broadcast the state by applying a broadcasted identity return qml.expval(ham) res = circuit() assert qml.math.allclose(res, 0.2) ``` This pair modifies the registered `coerce` method for Torch to always automigrate mixed CPU-GPU data to always favour the associated GPU. In addition, this method now also catches multi-GPU data, where tensors do not reside on the same index, and will fail outright. As a longer term solution, moving the Torch GPU dispatch calls to earlier in the stack would be more sound, but this fixes the aforementioned issue, at the expense of always migrating from CPU to GPU. **Description of the Change:** As above. **Benefits:** Allows automatic data migration from host to device when using a GPU backed tensor. In addition, will catch multi-GPU tensor data when using Torch, and fail due to non-local representations. **Possible Drawbacks:** Auto migration may not always be wanted. The alternative solution is to always be explicit about locality, and move the eigenvalue data to exist on the device at a higher layer in the stack. **Related GitHub Issues:** #5269 introduced changes that resulted in GPU errors.
PennyLaneAI · Mar 27, 2024 · 1bb10be · 1bb10be
1 parent 30f69b0
commit 1bb10be
Show file tree

Hide file tree

Showing 3 changed files with 31 additions and 9 deletions.
diff --git a/doc/releases/changelog-dev.md b/doc/releases/changelog-dev.md
@@ -291,6 +291,9 @@
 
 <h3>Bug fixes 🐛</h3>
 
+* Fix Torch tensor locality with autoray-registered coerce method.
+  [(#5438)](https://github.com/PennyLaneAI/pennylane/pull/5438)
+
 * `jax.jit` now works with `qml.sample` with a multi-wire observable.
   [(#5422)](https://github.com/PennyLaneAI/pennylane/pull/5422)
 

diff --git a/pennylane/math/multi_dispatch.py b/pennylane/math/multi_dispatch.py
@@ -333,6 +333,7 @@ def dot(tensor1, tensor2, like=None):
     x, y = np.coerce([tensor1, tensor2], like=like)
 
     if like == "torch":
+
         if x.ndim == 0 and y.ndim == 0:
             return x * y
 

diff --git a/pennylane/math/single_dispatch.py b/pennylane/math/single_dispatch.py
@@ -599,16 +599,34 @@ def _coerce_types_torch(tensors):
     torch = _i("torch")
 
     # Extract existing set devices, if any
-    device_set = set(t.device for t in tensors if isinstance(t, torch.Tensor))
-    if len(device_set) > 1:  # pragma: no cover
-        # GPU specific case
-        device_names = ", ".join(str(d) for d in device_set)
-        raise RuntimeError(
-            f"Expected all tensors to be on the same device, but found at least two devices, {device_names}!"
-        )
+    device_set = set()
+    dev_indices = set()
+    for t in tensors:
+        if isinstance(t, torch.Tensor):
+            device_set.add(t.device.type)
+            dev_indices.add(t.device.index)
+        else:
+            device_set.add("cpu")
+            dev_indices.add(None)
 
-    device = device_set.pop() if len(device_set) == 1 else None
-    tensors = [torch.as_tensor(t, device=device) for t in tensors]
+    if len(device_set) > 1:  # pragma: no cover
+        # If data exists on two separate GPUs, outright fail
+        if len([i for i in dev_indices if i is not None]) > 1:
+            device_names = ", ".join(str(d) for d in device_set)
+
+            raise RuntimeError(
+                f"Expected all tensors to be on the same device, but found at least two devices, {device_names}!"
+            )
+        # Otherwise, automigrate data from CPU to GPU and carry on.
+        dev_indices.remove(None)
+        dev_id = dev_indices.pop()
+        tensors = [
+            torch.as_tensor(t, device=torch.device(f"cuda:{dev_id}"))
+            for t in tensors  # pragma: no cover
+        ]
+    else:
+        device = device_set.pop()
+        tensors = [torch.as_tensor(t, device=device) for t in tensors]
 
     dtypes = {i.dtype for i in tensors}