Fix XPU inference #2383

notsyncing · 2024-01-28T13:57:31Z

What does this PR do?

With device_map = {"xpu": "16GB", "cpu": "32GB"} on an Intel XPU, it will complain about

Device xpu is not recognized, available devices are integers(for GPU/XPU), 'mps', 'cpu' and 'disk'"

but you cannot just put 0 as device, or it will treat 0 as CUDA device, then complains again that torch is not compiled with CUDA enabled.

You will need safetensors >= 0.4.2 if using safetensors files (huggingface/safetensors#428).

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

muellerzr · 2024-01-30T20:21:49Z

cc @SunMarc

SunMarc

Thanks for fixing the support of xpu for big model inference. Could you have a look @abhilash1910 since you did the initial integration with xpu ?

src/accelerate/utils/operations.py

SunMarc · 2024-01-31T16:38:02Z

src/accelerate/utils/modeling.py

+                target_device = device
+
+                if is_xpu_available() and isinstance(device, int):
+                    target_device = f"xpu:{device}"
+
+                with safe_open(checkpoint_file, framework="pt", device=target_device) as f:


Just to be sure, the model won't be loaded in the correct xpu device if we don't have the prefix "xpu:". Is that right ? cc @statelesshz since you added the support for npu and it might also be required.

Yes, you can see it here: huggingface/safetensors#428 file bindings/python/src/lib.rs line 263:

name if name.starts_with("xpu:") => { let tokens: Vec<_> = name.split(':').collect(); if tokens.len() == 2 { let device: usize = tokens[1].parse()?; Ok(Device::Xpu(device)) } else { Err(SafetensorError::new_err(format!( "device {name} is invalid" ))) } }

If you supply only a number, it will treat as CUDA device (the same file, line 278):

} else if let Ok(number) = ob.extract::<usize>() { Ok(Device::Cuda(number)) }

Yes I think the changes are correct . It requests a "xpu" device ; since the addition of safetensors.

Perfect ! Thanks for the explanation !

abhilash1910

LGTM ! Thanks @notsyncing for adding this; with the safetensors being added, this was something in the pipeline. Thanks @SunMarc for highlighting.

HuggingFaceDocBuilderDev · 2024-02-01T17:16:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Thanks again @notsyncing and @abhilash1910 for reviewing. Before merging this, could you add a check to raise an error if the user don't have the right version of safetensors with xpu by using compare_version and safentsors_version = version.parse(importlib.metadata.version("safentsors")) . Also don't forget to run make style.

Though it will complain about "Device xpu is not recognized, available devices are integers(for GPU/XPU), 'mps', 'cpu' and 'disk'", but you cannot just put 0 as device, or it will treat 0 as CUDA device, then complains again that torch is not compiled with CUDA enabled. You will need safetensors >= 0.4.2 if using safetensors files.

notsyncing · 2024-02-02T02:47:40Z

@SunMarc I have added the check of safetensors version, and fixed style errors.

SunMarc

LGTM ! Thanks for iterating.

notsyncing force-pushed the main branch from 56d0a23 to 75b6083 Compare January 28, 2024 14:05

notsyncing mentioned this pull request Jan 29, 2024

Segmentation fault with Intel GPU when LLM generating with transformers and accelerate intel/intel-extension-for-pytorch#521

Closed

notsyncing force-pushed the main branch from 75b6083 to 34262ae Compare January 30, 2024 08:12

muellerzr requested a review from SunMarc January 30, 2024 20:21

notsyncing mentioned this pull request Jan 31, 2024

[RFC] Precisely exclude buffer size when inferring auto device map #2398

Closed

5 tasks

notsyncing force-pushed the main branch from 34262ae to 79c09c9 Compare January 31, 2024 02:52

SunMarc approved these changes Jan 31, 2024

View reviewed changes

notsyncing force-pushed the main branch from 79c09c9 to 2d1152b Compare February 1, 2024 03:19

abhilash1910 approved these changes Feb 1, 2024

View reviewed changes

SunMarc approved these changes Feb 1, 2024

View reviewed changes

notsyncing force-pushed the main branch 2 times, most recently from fd785b7 to f8ea86f Compare February 2, 2024 02:39

notsyncing force-pushed the main branch from f8ea86f to 42dcc0b Compare February 2, 2024 02:46

This was referenced Feb 2, 2024

Fix the size of int and bool type when computing module size #2411

Merged

Check if the buffers fit GPU memory after device map auto inferred #2412

Merged

FEAT: Refactor device related code and add initial Intel GPU support xorbitsai/inference#968

Merged

SunMarc approved these changes Feb 2, 2024

View reviewed changes

SunMarc merged commit 46f1391 into huggingface:main Feb 2, 2024
23 checks passed

renovate bot mentioned this pull request May 4, 2024

fix(deps): update all non-major dependencies (minor) SocialGouv/ragga#15

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix XPU inference #2383

Fix XPU inference #2383

notsyncing commented Jan 28, 2024 •

edited

Loading

muellerzr commented Jan 30, 2024

SunMarc left a comment

SunMarc Jan 31, 2024

notsyncing Feb 1, 2024

abhilash1910 Feb 1, 2024

SunMarc Feb 1, 2024

abhilash1910 left a comment •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 1, 2024

SunMarc left a comment •

edited

Loading

notsyncing commented Feb 2, 2024

SunMarc left a comment

Fix XPU inference #2383

Fix XPU inference #2383

Conversation

notsyncing commented Jan 28, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

muellerzr commented Jan 30, 2024

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc Jan 31, 2024

Choose a reason for hiding this comment

notsyncing Feb 1, 2024

Choose a reason for hiding this comment

abhilash1910 Feb 1, 2024

Choose a reason for hiding this comment

SunMarc Feb 1, 2024

Choose a reason for hiding this comment

abhilash1910 left a comment • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 1, 2024

SunMarc left a comment • edited Loading

Choose a reason for hiding this comment

notsyncing commented Feb 2, 2024

SunMarc left a comment

Choose a reason for hiding this comment

notsyncing commented Jan 28, 2024 •

edited

Loading

abhilash1910 left a comment •

edited

Loading

SunMarc left a comment •

edited

Loading