-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: memory and lora cache improvements #379
Open
tazlin
wants to merge
81
commits into
main
Choose a base branch
from
memory-fixes-loracache
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- See Haidra-Org/hordelib#377 for more information
- See Haidra-Org/hordelib#382 for more information
Time has shown me that the readme was confusing laid out. I am hoping that this condensed and hopefully easier to read version will prevent onboarding friction.
This avoids an active bug on the API and also unsure consistency in the off chance the API and the worker model references disagree.
See Haidra-Org/hordelib#387 for more info
With the recent changes it becomes possible for this function to take longer than 15 seconds, at least in the case of a lora.json which predates those changes. To reduce protentional friction and crashes for users who want to "just" update, I'm going to increase this value. Additionally, I'm reordering the call to purge loras to try and ensure that any migrations/adhoc updates are in play before we start deleting things.
Any errors which make it to this point are probably terrible and so we should just end the process
See Haidra-Org/hordelib#390 for more information
- By moving lora/ti downloading to be before preloading the main model off disk, we should have more time to download at better times and improve the chances of a job skip when models are actively downloading - Piggy backing off of this change, the job skip logic needed to be tweak to account for the rework of the job steps
Because of the switch of the download_aux_models and preload_model step, there was a possibility for models to endless load and unload on a single process. This prevents that from happening and removes some of the legacy (and now moot) logic that was originally intended to make sure loras preloading happened earlier, which now happens as property of 5cc9974.
High vram cards do not seem to currently be clearing vram as I intend, this is an attempt at triggering that more often
Processes which have models that are still queued should not be killed
This should prevent corner cases where models that loaded within a few ticks of a SIGINT aren't registered properly, causing the process to be killed but the accounting of the model states to fall out of sync, causing the worker to never end (with a job stuck in queue and never attempted).
In the main loop, `get_next_job_and_process`, is called for information purposes only; if the function returns `None`, that's fine, and it probably just means gears haven't turned elsewhere. Due to the sub-function `handle_process_missing` trying to recover when there *should* be a return value, such as when called from `start_inference_`, it would then be appropriate to try and fix the model and process maps. Additionally, the prevents the "cleared RAM" message from triggering twice
Prior to this change, individual processes could be commanded to unload from ram repeatedly, many times a second. This prevents the command from being issued multiple times in a row
This can just lead to trading one blocking download for another
The lru cache used in this way is moot now due to checks done elsewhere that have the same practical effect
This is still an indication that something is wrong, which is why I am leaving it as a warning, but with all of the 1-tick or timing related issues that can lead to this, I don't see the reason to cast fear in to the operators hearts when it will often be able to actually recover. Additionally, the second log message changed was for debugging purposes only and does not necessarily indicate a critical/error condition.
Prior to this, the overloaded nature of `get_processes_with_model_for_queued_job` led to some confusion on shutdown, where it could hang indefinitely with a never-killed process, and models wouldn't be proactively preloaded as intended
Once upon a time, this seemed reasonable, but the current status quo doesn't fit with relying on this as such. The outgoing states ('HordeControlFlag') are useful for knowing whether or not to resend a HordeControlFlag, but they are not accurate (at least, as of recent changes) when it comes to measuring the reported process state
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
839ed33
hordelib#3667a7efe8
hordelib#370