fix: handle device in GPT model properly #143

aaronkl · 2024-10-21T12:27:38Z

What does this implement/fix? Explain your changes.

The GPT module automatically sets device to cuda if CUDA is available. This causes a runtime error, if we move the model to another device via .to(device) after initialization, since the rope cache is still on the original device.

Any other comments?

This fixes also the unit test and handle the handling of devices properly as discussed here.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

aaronkl added 2 commits October 21, 2024 14:20

handle device

cdfc500

pass device to input/output

2a406cd

aaronkl requested a review from rheasukthanker October 21, 2024 12:27

add missing whitespace

e6ff338

rheasukthanker approved these changes Oct 21, 2024

View reviewed changes

aaronkl added 2 commits October 22, 2024 16:24

Merge branch 'main' into device_handling

864505c

Merge branch 'main' into device_handling

6d874fc

aaronkl merged commit 4cdfcca into main Oct 23, 2024
7 checks passed

aaronkl deleted the device_handling branch October 23, 2024 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle device in GPT model properly #143

fix: handle device in GPT model properly #143

aaronkl commented Oct 21, 2024 •

edited

Loading

fix: handle device in GPT model properly #143

fix: handle device in GPT model properly #143

Conversation

aaronkl commented Oct 21, 2024 • edited Loading

What does this implement/fix? Explain your changes.

Any other comments?

aaronkl commented Oct 21, 2024 •

edited

Loading