Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Rework second-copy scaling logic (#49)
#### Motivation Model-mesh will currently ensure that there's two copies of any model which has been used "recently" to make them "HA" and minimize the chance of disruption if a single pod dies. However this has recently proved problematic in one case involving a periodic invocation of a large number of models (for example once per day). Second copies of all are loaded within a small timeframe, filling the cache and evicting other recently-used models. In this case there's little value in having second copies since the usage is isolated. A better heuristic is needed for deciding when redundant copies should be added. #### Modifications - Remove the current logic related to second-copy triggering. This includes methods invoked on the inference request path and from the regular janitor task. - Piggy-back on the existing frequent rate-tracking task to keep track of "iteration numbers" in which single-copy models are loaded, and only trigger a second copy when there's a prior usage more than 7 minutes but less than 40 minutes ago. If the usage is confined to a < 7min window it could be isolated; if > 40min apart the value of a second copy is lower (probability of pod death causing disruption is minimal). - By-pass the second-copy triggering altogether if the cache is full and LRU is low - More aggressively scale down second copies for inactive models, especially if the cache is full / LRU recent - Update unit test to reflect new behaviour #### Result More effective use of available model cache space, which should result in lower model memory requirement for many use cases. Note that this does not affect auto-scaling behaviour based on request load. A single copy may now scale up immediately if subjected to sufficient load independent of the aforementioned redundant copy logic. Signed-off-by: Nick Hill <nickhill@us.ibm.com> Signed-off-by: Nick Hill <nickhill@us.ibm.com>
- Loading branch information