Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/Help] <title>RuntimeError: MPS backend out of memory (MPS allocated: 18.05 GB, other allocations: 4.48 MB, max allowed: 18.13 GB). Tried to allocate 192.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure). #311

Closed
1 task done
dragononly opened this issue Mar 31, 2023 · 9 comments

Comments

@dragononly
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

一直跑不起来,
是因为我内存不够么
我的16G内存
cpu和常规模式能跑起来。
环境没问题
m1pro 芯片

Expected Behavior

No response

Steps To Reproduce

一直跑不起来,
是因为我内存不够么
我的16G内存
cpu和常规模式能跑起来。
环境没问题
m1pro 芯片

Environment

- OS:13.3
- Python:3.8
- Transformers:27
- PyTorch:2.1night
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :true

Anything else?

No response

@YIZXIY
Copy link
Contributor

YIZXIY commented Apr 1, 2023

是的,内存不够。
以及你这个CUDA Support :true很有意思,

@qitao052
Copy link

qitao052 commented Apr 1, 2023

我感觉可能是MacOS 13.3的问题。我自己maccbook是13.2的版本,mac studio 13,3的版本,然后我发现:

  1. 如果我model 用.half, 在macbook上跑没问题,结果秒出,内存python占用14G;但如果在mac studio上,就会发现一直不出结果,也不报错,python占用内存能直接到70G
  2. 如果在13.3的mac studio上,不用.half 而是用.float。 能运行,能出结果,就是整个内存占下来能到90%。实在太高了。
  3. 我是手欠把mac studio 从13.2升级到13.3. 但是在13.2版本的时候,我已经调试过是没问题的其实,升级后开始出现这个问题
  4. 我2台电脑的python环境都是3.9.16, pytorch 和 transformer 版本也都一致,分别是'2.1.0.dev20230324' 和 ‘4.26.1’

所以我怀疑,可能是这个13.3的问题。 当时升级到13.3就是因为程序报warning:
UserWarning: MPS: no support for int64 for min_max, downcasting to a smaller data type (int32/float32). Native support for int64 has been added in macOS 13.3.
但实际上,升级后13.3 后 反倒有问题了。

@chenguokai
Copy link

我感觉可能是MacOS 13.3的问题。我自己maccbook是13.2的版本,mac studio 13,3的版本,然后我发现:

  1. 如果我model 用.half, 在macbook上跑没问题,结果秒出,内存python占用14G;但如果在mac studio上,就会发现一直不出结果,也不报错,python占用内存能直接到70G
  2. 如果在13.3的mac studio上,不用.half 而是用.float。 能运行,能出结果,就是整个内存占下来能到90%。实在太高了。
  3. 我是手欠把mac studio 从13.2升级到13.3. 但是在13.2版本的时候,我已经调试过是没问题的其实,升级后开始出现这个问题
  4. 我2台电脑的python环境都是3.9.16, pytorch 和 transformer 版本也都一致,分别是'2.1.0.dev20230324' 和 ‘4.26.1’

所以我怀疑,可能是这个13.3的问题。 当时升级到13.3就是因为程序报warning: UserWarning: MPS: no support for int64 for min_max, downcasting to a smaller data type (int32/float32). Native support for int64 has been added in macOS 13.3. 但实际上,升级后13.3 后 反倒有问题了。

Same behavior (1 and 2) on my MacBook Pro M1Pro with macOS 13.3

@wukaiyu
Copy link

wukaiyu commented Apr 2, 2023

我也是碰到一样的问题,和qitao052描述的一摸一样,13.3系统下面只能用cpu,gpu加速失效,报内存不够

@vvanglro
Copy link

vvanglro commented Apr 7, 2023

我32g的 MacOS 13.3
一开始用的half跑了好久内存直奔40g了还没出结果 我就终止了
换了float后能出来结果但是也很慢用了148s, 有下面这个输出 不知道影响不影响

The dtype of attention mask (torch.int64) is not bool

@kanson1996
Copy link

+1,使用mps会报错,异常退出了:RuntimeError: MPS backend out of memory (MPS allocated: 11.82 GB, other allocations: 6.30 GB, max allowed: 18.13 GB). Tried to allocate 5.72 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
目前能正常运行的办法是不要用mps:

model = AutoModel.from_pretrained("./chatglm-6b", trust_remote_code=True).float()  # .half().to('mps')

缺点:响应特别慢,几乎一个符号一个文字返回结果
WX20230409-124037
WX20230409-123815

硬件环境:16G内存,M1芯片,Mac OS 13.3.1 (22E261),可用储存空间24.87 GB
软件环境:Python 3.10.10,torch 2.1.0.dev20230407,transformers 4.27.1
模型:chatglm-6b

@fquirin
Copy link

fquirin commented Apr 9, 2023

我32g的 MacOS 13.3 一开始用的half跑了好久内存直奔40g了还没出结果 我就终止了 换了float后能出来结果但是也很慢用了148s, 有下面这个输出 不知道影响不影响

The dtype of attention mask (torch.int64) is not bool

Same error/warning(?) on Intel 12th Gen x86 CPU, 16 GB RAM.
Model: "THUDM/chatglm-6b-int4-qe"
Answering a question takes about 2-5 minutes.

@duzx16
Copy link
Member

duzx16 commented Apr 9, 2023

之前的实现会触发一个pytorch的bug,现在已经修复了,见 #462

@zhuyeqingqing
Copy link

I select trust-remote-code and resolve it ,but i use facebook_galactica-6.7b model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants