Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xpu: parallelize() not supported for PyTorch XPU backend #35252

Open
4 tasks
dvrogozh opened this issue Dec 13, 2024 · 1 comment · May be fixed by #35269
Open
4 tasks

xpu: parallelize() not supported for PyTorch XPU backend #35252

dvrogozh opened this issue Dec 13, 2024 · 1 comment · May be fixed by #35269

Comments

@dvrogozh
Copy link
Contributor

dvrogozh commented Dec 13, 2024

With https://github.com/huggingface/transformers/releases/tag/v4.47.0.

Transforms gpt2, mt5, t5 and umt5 models don't support model parallelism when running on PyTorch XPU backend (on few gpu devices) as can be observed by running Transformers tests - see logs below.

Can model parallelism be supported for XPU backend?

$ cat spec.py
import torch
DEVICE_NAME = 'xpu'
MANUAL_SEED_FN = torch.xpu.manual_seed
EMPTY_CACHE_FN = torch.xpu.empty_cache
DEVICE_COUNT_FN = torch.xpu.device_count

$ TRANSFORMERS_TEST_DEVICE_SPEC=spec.py python3 -m pytest -rsf tests/models/ -k "test_model_parallelization or test_model_parallel_equal_results"
<...>
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/mt5/test_modeling_mt5.py::MT5ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/mt5/test_modeling_mt5.py::MT5ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/mt5/test_modeling_mt5.py::MT5EncoderOnlyModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/mt5/test_modeling_mt5.py::MT5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/t5/test_modeling_t5.py::T5ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/t5/test_modeling_t5.py::T5ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/t5/test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/t5/test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/umt5/test_modeling_umt5.py::UMT5EncoderOnlyModelTest::test_model_parallel_equal_results - AttributeError: 'UMT5EncoderModel' object has no attribute 'parallelize'
FAILED tests/models/umt5/test_modeling_umt5.py::UMT5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
=============================== 12 failed, 682 skipped, 76163 deselected, 5 warnings in 24.79s ================================

CC: @ArthurZucker @SunMarc

@dvrogozh dvrogozh changed the title xpu: model parallelism not supported for PyTorch XPU backend xpu: model_parallel not supported for PyTorch XPU backend Dec 13, 2024
dvrogozh added a commit to dvrogozh/transformers that referenced this issue Dec 13, 2024
`parallelize()` API is deprecated in favor of accelerate's `device_map="auto"`
and therefore is not accepting new features. At the same time `parallelize()`
implementation is currently CUDA-specific. This commit marks respective
ci tests with `@require_torch_gpu`.

Fixes: huggingface#35252
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@dvrogozh dvrogozh linked a pull request Dec 13, 2024 that will close this issue
@dvrogozh dvrogozh changed the title xpu: model_parallel not supported for PyTorch XPU backend xpu: parallelize() not supported for PyTorch XPU backend Dec 13, 2024
@dvrogozh
Copy link
Contributor Author

As discussed in #35253 (review), parallelize() API is deprecated. As such it's not reasonable to add changes with new features (adding XPU backend support). Instead we've agreed to mark such tests as CUDA specific (#35253 (comment)). See the following PR for that:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant