xpu: parallelize() not supported for PyTorch XPU backend #35252

dvrogozh · 2024-12-13T00:29:28Z

With https://github.com/huggingface/transformers/releases/tag/v4.47.0.

Transforms gpt2, mt5, t5 and umt5 models don't support model parallelism when running on PyTorch XPU backend (on few gpu devices) as can be observed by running Transformers tests - see logs below.

Can model parallelism be supported for XPU backend?

For GPT2 model, gpt2: enable model_parallel for xpu backend #35253
For MT5 model
For T5 model
For UMT5 model

$ cat spec.py
import torch
DEVICE_NAME = 'xpu'
MANUAL_SEED_FN = torch.xpu.manual_seed
EMPTY_CACHE_FN = torch.xpu.empty_cache
DEVICE_COUNT_FN = torch.xpu.device_count

$ TRANSFORMERS_TEST_DEVICE_SPEC=spec.py python3 -m pytest -rsf tests/models/ -k "test_model_parallelization or test_model_parallel_equal_results"
<...>
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/mt5/test_modeling_mt5.py::MT5ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/mt5/test_modeling_mt5.py::MT5ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/mt5/test_modeling_mt5.py::MT5EncoderOnlyModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/mt5/test_modeling_mt5.py::MT5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/t5/test_modeling_t5.py::T5ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/t5/test_modeling_t5.py::T5ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/t5/test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/t5/test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/umt5/test_modeling_umt5.py::UMT5EncoderOnlyModelTest::test_model_parallel_equal_results - AttributeError: 'UMT5EncoderModel' object has no attribute 'parallelize'
FAILED tests/models/umt5/test_modeling_umt5.py::UMT5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
=============================== 12 failed, 682 skipped, 76163 deselected, 5 warnings in 24.79s ================================

CC: @ArthurZucker @SunMarc

The text was updated successfully, but these errors were encountered:

`parallelize()` API is deprecated in favor of accelerate's `device_map="auto"` and therefore is not accepting new features. At the same time `parallelize()` implementation is currently CUDA-specific. This commit marks respective ci tests with `@require_torch_gpu`. Fixes: huggingface#35252 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh · 2024-12-13T20:41:03Z

As discussed in #35253 (review), parallelize() API is deprecated. As such it's not reasonable to add changes with new features (adding XPU backend support). Instead we've agreed to mark such tests as CUDA specific (#35253 (comment)). See the following PR for that:

ci: mark model_parallel tests as cuda specific #35269

dvrogozh mentioned this issue Dec 13, 2024

gpt2: enable model_parallel for xpu backend #35253

Closed

dvrogozh changed the title ~~xpu: model parallelism not supported for PyTorch XPU backend~~ xpu: model_parallel not supported for PyTorch XPU backend Dec 13, 2024

dvrogozh linked a pull request Dec 13, 2024 that will close this issue

ci: mark model_parallel tests as cuda specific #35269

Open

dvrogozh changed the title ~~xpu: model_parallel not supported for PyTorch XPU backend~~ xpu: parallelize() not supported for PyTorch XPU backend Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xpu: parallelize() not supported for PyTorch XPU backend #35252

xpu: parallelize() not supported for PyTorch XPU backend #35252

dvrogozh commented Dec 13, 2024 •

edited

Loading

dvrogozh commented Dec 13, 2024

xpu: parallelize() not supported for PyTorch XPU backend #35252

xpu: parallelize() not supported for PyTorch XPU backend #35252

Comments

dvrogozh commented Dec 13, 2024 • edited Loading

dvrogozh commented Dec 13, 2024

dvrogozh commented Dec 13, 2024 •

edited

Loading