You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add the ability to split context between multiple GPUs, much as model layers can currently be split.
Motivation
Currently, with multi-GPU setups, LCPP only stores/processes context on the "first" GPU. This is fine for most models which are only capable of handling 4k context tokens natively (or double that with rope scaling). But as more and more large context models are being released, this limitation is becoming an issue. For example, the new yi 34b 200k models are limited to however much context can be fit into the first GPU only (64k in the case of a 24GB card), regardless of total VRAM available. If context could be split across multiple cards, then a larger context window could be utilized.
The text was updated successfully, but these errors were encountered:
This problem becomes even more apparent with more, smaller GPU's. I for an instance, have 2 Tesla K80's, meaning that I have 4x 12Gb of vram. When running a large model like Dolphin-mixtral-2.6 Q5, I can only utilize up to 36Gb of the theoretical 48Gb, due to the first GPU running out of memory to store context.
Feature Description
Add the ability to split context between multiple GPUs, much as model layers can currently be split.
Motivation
Currently, with multi-GPU setups, LCPP only stores/processes context on the "first" GPU. This is fine for most models which are only capable of handling 4k context tokens natively (or double that with rope scaling). But as more and more large context models are being released, this limitation is becoming an issue. For example, the new yi 34b 200k models are limited to however much context can be fit into the first GPU only (64k in the case of a 24GB card), regardless of total VRAM available. If context could be split across multiple cards, then a larger context window could be utilized.
The text was updated successfully, but these errors were encountered: