fix(ml): limit load retries #10494

mertalev · 2024-06-19T19:48:04Z

Description

The ML service tries to load a model, then on failure clears the cache and retries once. However, a new request will mean another two attempts to load. Given that the service can receive many requests for a model, failing to load/download can cause an excessive number of attempts.

This PR limits a single model instance to one retry across its lifetime. After a model is unloaded or the service restarts, it will be able to re-attempt it.

How Has This Been Tested?

Tested by making a model throw an error on the first attempt, but not the second, and confirming that it still loaded. If I made it throw an error every time, it didn't try to load the model repeatedly.

retry once

1a9d628

github-actions bot added the 🧠machine-learning label Jun 19, 2024

fix import

9c0a362

jrasm91 approved these changes Jun 20, 2024

View reviewed changes

mertalev merged commit a42af06 into main Jun 20, 2024
22 checks passed

mertalev deleted the fix/ml-limit-retries branch June 20, 2024 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ml): limit load retries #10494

fix(ml): limit load retries #10494

mertalev commented Jun 19, 2024

fix(ml): limit load retries #10494

fix(ml): limit load retries #10494

Conversation

mertalev commented Jun 19, 2024

Description

How Has This Been Tested?