-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Withdraw resource when vm creation is failed #289
Conversation
…ource-scheduler into feature/resource
@@ -217,12 +215,12 @@ func (p *Process) SendPodToCluster(pod *v1.Pod) { | |||
go func() { | |||
instanceId, err := openstack.ServerCreate(host, token, &pod.Spec) | |||
if err == nil { | |||
klog.V(3).Infof("The openstack vm for the pod %v has been created at the host %v", pod.ObjectMeta.Name, host) | |||
klog.Infof("The openstack vm for the pod %v has been created at the host %v", pod.ObjectMeta.Name, host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dispatcher process log comes with log level. Why changing it to Infof without log level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens by chance. It was recovered.
@@ -80,6 +80,13 @@ type ScheduleResult struct { | |||
FeasibleSites int // Number of feasible site on one stack scheduled | |||
} | |||
|
|||
type PodSiteResourceAllocation struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is confusing. There is no allocation object at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct name is changed to "PodSiteResource"
if pod, ok := t.Obj.(*v1.Pod); ok { | ||
return failedToSchedule(pod) && responsibleForPod(pod, sched.SchedulerName) | ||
} | ||
utilruntime.HandleError(fmt.Errorf("unable to convert object %T to *v1.Pod in %T", obj, sched)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%T is the type of the object. A typo of %v?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%T is type of variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore it is equals to ".(type)", right? It is cache.DeletedFinalStateUnknown as we already know. Do we need to knew sched type as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is from HQ and I just don't change if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we can copy codes like that. We don't need sched type at all.
utilruntime.HandleError(fmt.Errorf("unable to convert object %T to *v1.Pod in %T", obj, sched)) | ||
return false | ||
default: | ||
utilruntime.HandleError(fmt.Errorf("unable to handle object in %T: %T", sched, obj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%T is the type of the object. A typo of %v?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%T is type of variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to know sched's type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is from HQ.
verified = false | ||
name := pod.Name | ||
flavors := pod.Spec.VirtualMachine.Flavors | ||
if pod.Name == "" || flavors == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pod can be a container pod without any flavor information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flavor check is removed
@@ -310,7 +310,7 @@ func (f *Flavor) Filter(ctx context.Context, cycleState *interfaces.CycleState, | |||
var isCommonMatch, _ = isComFlavorMatch(flavorMap, siteCacheInfo) | |||
var isSpotMatch, _ = isSpotFlavorMatch(spotFlavorMap, siteCacheInfo) | |||
if isCommonMatch && isSpotMatch { | |||
klog.Infof("*** isCommonMatch:%v, isSpotMatch:%v ", isCommonMatch, isSpotMatch) | |||
klog.Infof("isCommonMatch:%v, isSpotMatch:%v ", isCommonMatch, isSpotMatch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is Infof with log level. I am not sure that scheduler starts with any log level info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. I tried, but it didn't work yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the rest of you codes, you add log level while this part still remains infof with infof without level. I am not sure if the other parts work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infof() is again from HQ original source code. If we decide to change, it is not an issue. We can discuss in meeting today. Then we may change 800 files. I am not sure it is fine.
@@ -643,20 +653,28 @@ func (n *SiteCacheInfo) updateSiteFlavor(resourceTypes []string, regionFlavors m | |||
} | |||
} | |||
|
|||
func (n *SiteCacheInfo) deductFlavor() { | |||
func (n *SiteCacheInfo) updateFlavorCount(deduct bool) { | |||
var m int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My big concern here is that map is not thread safe in golang. Do we need to add mutex here? We might add mutex in the codes to call the function. However, people might be not cautious enough to update map directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutex is used for this map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golang map does not come with mutex. If I call updateFlavorCount twice at the same time, does it work as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateFlavorCount() is private function and it is called by only updateSiteFlavor() and mutex is used there.
//siteSelectedInfo is type of SiteSelectorInfo at cycle_state.go | ||
siteSelectedInfo, err := interfaces.GetSiteSelectorState(state, siteID) | ||
if err != nil { | ||
klog.Errorf("Gettng site selector state failed! err: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is changed to "Getting"
//siteSelectedInfo is type of SiteSelectorInfo at cycle_state.go | ||
siteSelectedInfo, err := interfaces.GetSiteSelectorState(state, siteID) | ||
if err != nil { | ||
klog.Errorf("Gettng site selector state failed! err: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err is not a string. Please use %v not %s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is changed to %v
stack.Resources[i].FlavorIDSelected = flavorID | ||
klog.V(4).Infof("GetFlavor - flavorID: %s, region: %s", flavorID, region) | ||
flv, ok := cache.FlavorCache.GetFlavor(flavorID, region) | ||
if !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very sure about the logic here. If the pod flavor does not exist in that region, do we need to return scheduling failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, it assign the best site based on score.
tried reopen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to reopen..
Reopen |
What type of PR is this?
What this PR does / why we need it:
This PR proposes to withdraw site (cluster) resource allocated to a pod when the pod's vm creation is failed.
Which issue(s) this PR fixes:
Fixes #286
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
NONE
How to test
~/go/src/k8s.io/arktos$./hack/globalscheduler/globalscheduler-up.sh
open a new terminal
$cd ~/go/src/k8s.io/arktos/globalscheduler/test/yaml
$kubectl apply -f sample_6_clusters.yaml
$kubectl apply -f sample_2_schedulers.yaml
$kubectl apply -f sample_2_distributors.yaml
$kubectl apply -f sample_2_dispatchers.yaml
$kubectl apply -f sample_6_pods.yaml
$kubectl get pods