scheduler: optimize numa affinity store #2209
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ⅰ. Describe what this PR does
当前每个节点的最佳 NUMA Affinity 再 Filter 阶段判断之后保存在 CycleState 中,这在通常调度场景下没有问题。但是在如下场景下会有问题,我们提供抢占者为 preemptor、被抢占者为 victim,两者都申请 NUMA 感知、绑核、8卡,并且亲和同一个 8 卡 节点。
a. 带 Nominating Filter:在 Filter 时候做一次 state.Clone,然后用 clone 的 state 去过 Filter,这时候 numa Affinity 保存到了 clone 的state 中
b. 不带 Nominating Filter:由于判断到 a 中并没有执行实际的 AddPod,所以这里逻辑认为 a Filter 结果有效,直接返回
上述流程就导致 NUMA Affinity 没有实际保存下来,倒是本来要申请双 NUMA 的节点只分到了单 NUMA 资源,看起来就像没有开启 NUMA 感知调度一样。
本 PR 将 affinityStore 的clone 方式改为浅拷贝,并:
Ⅱ. Does this pull request fix one issue?
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
make test