You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am observing "could not create an engine" error in executing demo.py example from "oneCCL Bindings for PyTorch Getting Started Sample*". The code is run on Saphire node with 4 PVCs at TACC system. Any suggestions on identifying the cause and fixing it?
(base) c551-003pvc$ mpirun -n 2 -l python demo.py -dev xpu
[0] Runing Iteration: 0 on device xpu:0
[0] Runing forward: 0 on device xpu:0
[0] Traceback (most recent call last):
[0] File "/scratch/05231/aruhela/demo.py", line 67, in
[1] Runing Iteration: 0 on device xpu:1
[1] Runing forward: 0 on device xpu:1
[1] Traceback (most recent call last):
[1] File "/scratch/05231/aruhela/demo.py", line 67, in
[0] res = model(input)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] res = model(input)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
[0] else self._run_ddp_forward(*inputs, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
[1] else self._run_ddp_forward(*inputs, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
[0] return self.module(*inputs, **kwargs) # type: ignore[index]
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self.module(*inputs, **kwargs) # type: ignore[index]
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] File "/scratch/05231/aruhela/demo.py", line 26, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/05231/aruhela/demo.py", line 26, in forward
[0] return self.linear(input)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self.linear(input)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
[0] return F.linear(input, self.weight, self.bias)
[0] RuntimeError: could not create an engine
[1] return F.linear(input, self.weight, self.bias)
[1] RuntimeError: could not create an engine
(base) c551-003pvc$
Hi Intel Team
I am observing "could not create an engine" error in executing demo.py example from "oneCCL Bindings for PyTorch Getting Started Sample*". The code is run on Saphire node with 4 PVCs at TACC system. Any suggestions on identifying the cause and fixing it?
(base) c551-003pvc$ mpirun -n 2 -l python demo.py -dev xpu
[0] Runing Iteration: 0 on device xpu:0
[0] Runing forward: 0 on device xpu:0
[0] Traceback (most recent call last):
[0] File "/scratch/05231/aruhela/demo.py", line 67, in
[1] Runing Iteration: 0 on device xpu:1
[1] Runing forward: 0 on device xpu:1
[1] Traceback (most recent call last):
[1] File "/scratch/05231/aruhela/demo.py", line 67, in
[0] res = model(input)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] res = model(input)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
[0] else self._run_ddp_forward(*inputs, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
[1] else self._run_ddp_forward(*inputs, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
[0] return self.module(*inputs, **kwargs) # type: ignore[index]
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self.module(*inputs, **kwargs) # type: ignore[index]
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] File "/scratch/05231/aruhela/demo.py", line 26, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/05231/aruhela/demo.py", line 26, in forward
[0] return self.linear(input)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[1] return self.linear(input)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
[0] return self._call_impl(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[1] return self._call_impl(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
[0] return forward_call(*args, **kwargs)
[0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
[1] return forward_call(*args, **kwargs)
[1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
[0] return F.linear(input, self.weight, self.bias)
[0] RuntimeError: could not create an engine
[1] return F.linear(input, self.weight, self.bias)
[1] RuntimeError: could not create an engine
(base) c551-003pvc$
Notes: OneAPI release is 2024.2
Install command (AI Selector Tool):
conda install -c intel -c conda-forge --override-channels intel/label/oneapi::intel-extension-for-pytorch=2.1.20 intel/label/oneapi::pytorch=2.1.0 intel/label/oneapi::oneccl_bind_pt=2.1.200 intel/label/oneapi::torchvision=0.16.0 intel/label/oneapi::torchaudio=2.1.0 conda-forge::deepspeed=0.14.0 python=3.9
Thanks
Amit Ruhela
The text was updated successfully, but these errors were encountered: