You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transforms gpt2, mt5, t5 and umt5 models don't support model parallelism when running on PyTorch XPU backend (on few gpu devices) as can be observed by running Transformers tests - see logs below.
Can model parallelism be supported for XPU backend?
$ cat spec.py
import torch
DEVICE_NAME = 'xpu'
MANUAL_SEED_FN = torch.xpu.manual_seed
EMPTY_CACHE_FN = torch.xpu.empty_cache
DEVICE_COUNT_FN = torch.xpu.device_count
$ TRANSFORMERS_TEST_DEVICE_SPEC=spec.py python3 -m pytest -rsf tests/models/ -k "test_model_parallelization or test_model_parallel_equal_results"
<...>
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/mt5/test_modeling_mt5.py::MT5ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/mt5/test_modeling_mt5.py::MT5ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/mt5/test_modeling_mt5.py::MT5EncoderOnlyModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/mt5/test_modeling_mt5.py::MT5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/t5/test_modeling_t5.py::T5ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/t5/test_modeling_t5.py::T5ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/t5/test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/t5/test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/umt5/test_modeling_umt5.py::UMT5EncoderOnlyModelTest::test_model_parallel_equal_results - AttributeError: 'UMT5EncoderModel' object has no attribute 'parallelize'
FAILED tests/models/umt5/test_modeling_umt5.py::UMT5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
=============================== 12 failed, 682 skipped, 76163 deselected, 5 warnings in 24.79s ================================
dvrogozh
changed the title
xpu: model parallelism not supported for PyTorch XPU backend
xpu: model_parallel not supported for PyTorch XPU backend
Dec 13, 2024
dvrogozh
added a commit
to dvrogozh/transformers
that referenced
this issue
Dec 13, 2024
`parallelize()` API is deprecated in favor of accelerate's `device_map="auto"`
and therefore is not accepting new features. At the same time `parallelize()`
implementation is currently CUDA-specific. This commit marks respective
ci tests with `@require_torch_gpu`.
Fixes: huggingface#35252
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh
changed the title
xpu: model_parallel not supported for PyTorch XPU backend
xpu: parallelize() not supported for PyTorch XPU backend
Dec 13, 2024
As discussed in #35253 (review), parallelize() API is deprecated. As such it's not reasonable to add changes with new features (adding XPU backend support). Instead we've agreed to mark such tests as CUDA specific (#35253 (comment)). See the following PR for that:
With https://github.com/huggingface/transformers/releases/tag/v4.47.0.
Transforms gpt2, mt5, t5 and umt5 models don't support model parallelism when running on PyTorch XPU backend (on few gpu devices) as can be observed by running Transformers tests - see logs below.
Can model parallelism be supported for XPU backend?
CC: @ArthurZucker @SunMarc
The text was updated successfully, but these errors were encountered: