-
Notifications
You must be signed in to change notification settings - Fork 38
Handling cells with hardware failure #23
Comments
Here is the psudo-code of the above algorithm. First, we define
If level-n is the highest level, then num_splittable[i] can be calculated recursively as follows:
Thus, the sharing safety guarantee is equivalent to guarantee num_splittable[i] >= 0 for all level i at any time.
We need to prove the above algorithm's correctness and completeness. [Correctness] The algorithm will not violate sharing safety guarantee. [Completeness] Using this algorithm, there is no such case that the allocation of a level-k cell request fails but it is possible to obtain a healthy level-k cell without violating sharing safety. Proof: The proof of compeleteness can be done by contradiction. If the algorithm is not complete, it means there is a "Case" that:
If k’ = k, it is impossible that num_splittable[k’] > 0 at level-k because "we will firstly allocate the healthier cell". If k’ > k, the Case is impossible. Because the algorithm will split all cells that num_splittable[k’’] > 0 (for all k’’ >= k) if the algorithm fails. There will be no cells that num_splittable[k’] > 0, which contradicts with the Case. |
Thanks a lot, Zhenhua! |
Yes. The safety-check provides the necessary condition of allocation. A possible use case is to avoid unnecessary preemption of low-priority jobs. More generally, if we need to support customized physical scheduling policy, we need to provide an interface of this condition in some form (e.g., as a list of splittable/allocable cells at all levels). |
Agree that the safety-check provides a necessary condition of arbitrary cell allocation. Looks like we can generalize the algorithm and prove it necessary and sufficient, and looks like we can reuse the proof you guys did a long time ago. |
The text was updated successfully, but these errors were encountered: