Skip to content

Commit

Permalink
Merge pull request #377 from askmyteapot/Fix-Multi-gpu-GPTQ-Llama-no-…
Browse files Browse the repository at this point in the history
…tokens

Update GPTQ_Loader.py
  • Loading branch information
oobabooga authored Mar 17, 2023
2 parents ee164d1 + 53b6a66 commit 4c13067
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion modules/GPTQ_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def load_quantized(model_name):
max_memory[i] = f"{shared.args.gpu_memory[i]}GiB"
max_memory['cpu'] = f"{shared.args.cpu_memory or '99'}GiB"

device_map = accelerate.infer_auto_device_map(model, max_memory=max_memory, no_split_module_classes=["LLaMADecoderLayer"])
device_map = accelerate.infer_auto_device_map(model, max_memory=max_memory, no_split_module_classes=["LlamaDecoderLayer"])
model = accelerate.dispatch_model(model, device_map=device_map)

# Single GPU
Expand Down

0 comments on commit 4c13067

Please sign in to comment.