fix: memory and lora cache improvements #379

tazlin · 2024-12-24T21:12:25Z

Includes the latest comfyui (73e0498) as of the authoring of this message, which includes stability improvements, bug fixes and better memory management (especially for <=10gb VRAM cards).
Includes fixes to the lora model management system, especially to do with adhoc locs.
fix: warn when backslashes are in config file
- There was a user report of a cryptic yaml read error which turned out to be because they used a windows native path format (i.e., with backslashes) which causes issues. The reason for this is probably not a familiar subject to most end users, but I also don't want to (potentially) destructively change the file in place, so I am opting for a warning and explanation of how to correct this.

- See Haidra-Org/hordelib#377 for more information

- See Haidra-Org/hordelib#382 for more information

Time has shown me that the readme was confusing laid out. I am hoping that this condensed and hopefully easier to read version will prevent onboarding friction.

This avoids an active bug on the API and also unsure consistency in the off chance the API and the worker model references disagree.

See Haidra-Org/hordelib#387 for more info

With the recent changes it becomes possible for this function to take longer than 15 seconds, at least in the case of a lora.json which predates those changes. To reduce protentional friction and crashes for users who want to "just" update, I'm going to increase this value. Additionally, I'm reordering the call to purge loras to try and ensure that any migrations/adhoc updates are in play before we start deleting things.

Any errors which make it to this point are probably terrible and so we should just end the process

See Haidra-Org/hordelib#390 for more information

- By moving lora/ti downloading to be before preloading the main model off disk, we should have more time to download at better times and improve the chances of a job skip when models are actively downloading - Piggy backing off of this change, the job skip logic needed to be tweak to account for the rework of the job steps

Because of the switch of the download_aux_models and preload_model step, there was a possibility for models to endless load and unload on a single process. This prevents that from happening and removes some of the legacy (and now moot) logic that was originally intended to make sure loras preloading happened earlier, which now happens as property of 5cc9974.

High vram cards do not seem to currently be clearing vram as I intend, this is an attempt at triggering that more often

Processes which have models that are still queued should not be killed

This should prevent corner cases where models that loaded within a few ticks of a SIGINT aren't registered properly, causing the process to be killed but the accounting of the model states to fall out of sync, causing the worker to never end (with a job stuck in queue and never attempted).

In the main loop, `get_next_job_and_process`, is called for information purposes only; if the function returns `None`, that's fine, and it probably just means gears haven't turned elsewhere. Due to the sub-function `handle_process_missing` trying to recover when there *should* be a return value, such as when called from `start_inference_`, it would then be appropriate to try and fix the model and process maps. Additionally, the prevents the "cleared RAM" message from triggering twice

Prior to this change, individual processes could be commanded to unload from ram repeatedly, many times a second. This prevents the command from being issued multiple times in a row

This can just lead to trading one blocking download for another

The lru cache used in this way is moot now due to checks done elsewhere that have the same practical effect

This is still an indication that something is wrong, which is why I am leaving it as a warning, but with all of the 1-tick or timing related issues that can lead to this, I don't see the reason to cast fear in to the operators hearts when it will often be able to actually recover. Additionally, the second log message changed was for debugging purposes only and does not necessarily indicate a critical/error condition.

Prior to this, the overloaded nature of `get_processes_with_model_for_queued_job` led to some confusion on shutdown, where it could hang indefinitely with a never-killed process, and models wouldn't be proactively preloaded as intended

Once upon a time, this seemed reasonable, but the current status quo doesn't fit with relying on this as such. The outgoing states ('HordeControlFlag') are useful for knowing whether or not to resend a HordeControlFlag, but they are not accurate (at least, as of recent changes) when it comes to measuring the reported process state

tazlin added 30 commits December 24, 2024 16:01

chore: version bump

a530f44

feat/fix: use horde-engine w/ lora cache fixes + latest comfyui

12b137b

fix: use latest horde deps

2d8afed

chore: version bump

265685a

fix: use latest horde-engine

3a9e3dd

chore: version bump

129df11

fix: use latest horde-engine w/ memory fix

0dafa97

- See Haidra-Org/hordelib#377 for more information

chore: version bump

aae9127

fix: use latest horde-engine with lora mm crash fixes

eb1ec78

chore: version bump

0007952

fix: warn when backslashes are in config file

ea36e67

fix: use horde-engine w/ clip loading fix

f84511f

- See Haidra-Org/hordelib#382 for more information

chore: version bump

3b90959

docs: clarify threads recommendations

c0f8caa

docs: main readme redo

03c6979

Time has shown me that the readme was confusing laid out. I am hoping that this condensed and hopefully easier to read version will prevent onboarding friction.

fix: force only known (locally) models

5189947

This avoids an active bug on the API and also unsure consistency in the off chance the API and the worker model references disagree.

style: fix

6746cfa

docs/style: fix

fbe16df

fix: use latest horde-engine w/ lora 403 fix

2c3fa30

fix: use horde-engine with fix to m.proc. lock

b421e75

See Haidra-Org/hordelib#387 for more info

chore: version bump

d0e8c46

fix: punt on long steps earlier

897863e

fix: don't load horde-engine on proc 0

5acadc1

fix: don't use horde-engine based vram reports by default

844680d

fix: end process on unhandled message failures

e977c44

Any errors which make it to this point are probably terrible and so we should just end the process

fix: only download model ref. files on proc 1

b71058d

fix: unload models often

99601f5

fix: more aggressive unloading from ram

12565d4

chore: version bump

6aa8911

tazlin added 30 commits December 30, 2024 21:28

fix: use horde-engine with ti type error fix

6db4298

See Haidra-Org/hordelib#390 for more information

chore: version bump

b461063

fix: include all pins for latest horde-engine

54679c3

fix: more accurate deadlock detection

1f80cbb

fix: force empty torch cache

71c2a19

fix: explicitly clear vram more often

1a4eb25

High vram cards do not seem to currently be clearing vram as I intend, this is an attempt at triggering that more often

fix: don't assume any models are loaded

566bbcc

fix: don't prematurely kill processes

cfdd2e6

Processes which have models that are still queued should not be killed

fix: don't attempt to restart procs. on shutdown

a30344b

fix: don't show superfluous errors during shutdown

406b1ee

fix: don't report model ON_DISK if didn't unload

4a09694

fix: resolve container type mishap with get_next_n_models

7a32dcc

chore: version bump

ff5754d

chore: version bump

0094f86

fix: only unload from ram as needed

0760e42

Prior to this change, individual processes could be commanded to unload from ram repeatedly, many times a second. This prevents the command from being issued multiple times in a row

fix: typos

adcd31b

fix: don't let jobs with loras skip the line

0ea68b9

This can just lead to trading one blocking download for another

fix: remove stale/duplicate unloading code

fcb11cf

The lru cache used in this way is moot now due to checks done elsewhere that have the same practical effect

chore: version bump

241eaef

fix: use intended processes for preload

5fcfd62

Prior to this, the overloaded nature of `get_processes_with_model_for_queued_job` led to some confusion on shutdown, where it could hang indefinitely with a never-killed process, and models wouldn't be proactively preloaded as intended

chore: version bump

71acd46

fix: don't get hung on shutdown

419449d

chore: version bump

3164393

fix: don't spam log message about not unloading

c11fe3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: memory and lora cache improvements #379

fix: memory and lora cache improvements #379

tazlin commented Dec 24, 2024 •

edited

Loading

fix: memory and lora cache improvements #379

Are you sure you want to change the base?

fix: memory and lora cache improvements #379

Conversation

tazlin commented Dec 24, 2024 • edited Loading

tazlin commented Dec 24, 2024 •

edited

Loading