Skip to content

Commit

Permalink
feat: Rework second-copy scaling logic (#49)
Browse files Browse the repository at this point in the history
#### Motivation

Model-mesh will currently ensure that there's two copies of any model which has been used "recently" to make them "HA" and minimize the chance of disruption if a single pod dies. However this has recently proved problematic in one case involving a periodic invocation of a large number of models (for example once per day). Second copies of all are loaded within a small timeframe, filling the cache and evicting other recently-used models. In this case there's little value in having second copies since the usage is isolated.

A better heuristic is needed for deciding when redundant copies should be added.

#### Modifications

- Remove the current logic related to second-copy triggering. This includes methods invoked on the inference request path and from the regular janitor task.
- Piggy-back on the existing frequent rate-tracking task to keep track of "iteration numbers" in which single-copy models are loaded, and only trigger a second copy when there's a prior usage more than 7 minutes but less than 40 minutes ago. If the usage is confined to a < 7min window it could be isolated; if > 40min apart the value of a second copy is lower (probability of pod death causing disruption is minimal).
- By-pass the second-copy triggering altogether if the cache is full and LRU is low
- More aggressively scale down second copies for inactive models, especially if the cache is full / LRU recent
- Update unit test to reflect new behaviour

#### Result

More effective use of available model cache space, which should result in lower model memory requirement for many use cases.

Note that this does not affect auto-scaling behaviour based on request load. A single copy may now scale up immediately if subjected to sufficient load independent of the aforementioned redundant copy logic.

Signed-off-by: Nick Hill <nickhill@us.ibm.com>

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
  • Loading branch information
njhill authored Aug 2, 2022
1 parent 77dbeab commit 2790ef2
Show file tree
Hide file tree
Showing 2 changed files with 252 additions and 281 deletions.
Loading

0 comments on commit 2790ef2

Please sign in to comment.