Withdraw resource when vm creation is failed #289

kimeunju108 · 2021-05-18T20:03:09Z

What type of PR is this?

/kind feature

What this PR does / why we need it:
This PR proposes to withdraw site (cluster) resource allocated to a pod when the pod's vm creation is failed.

Which issue(s) this PR fixes:
Fixes #286

Special notes for your reviewer:

Does this PR introduce a user-facing change?:
NONE

How to test
~/go/src/k8s.io/arktos$./hack/globalscheduler/globalscheduler-up.sh
open a new terminal
$cd ~/go/src/k8s.io/arktos/globalscheduler/test/yaml
$kubectl apply -f sample_6_clusters.yaml
$kubectl apply -f sample_2_schedulers.yaml
$kubectl apply -f sample_2_distributors.yaml
$kubectl apply -f sample_2_dispatchers.yaml
$kubectl apply -f sample_6_pods.yaml
$kubectl get pods

…ource-scheduler into feature/resource

jshaofuturewei · 2021-05-18T21:02:51Z

globalscheduler/controllers/dispatcher/dispatcher_process.go

@@ -217,12 +215,12 @@ func (p *Process) SendPodToCluster(pod *v1.Pod) {
 			go func() {
 				instanceId, err := openstack.ServerCreate(host, token, &pod.Spec)
 				if err == nil {
-					klog.V(3).Infof("The openstack vm for the pod %v has been created at the host %v", pod.ObjectMeta.Name, host)
+					klog.Infof("The openstack vm for the pod %v has been created at the host %v", pod.ObjectMeta.Name, host)


Dispatcher process log comes with log level. Why changing it to Infof without log level?

This happens by chance. It was recovered.

jshaofuturewei · 2021-05-18T21:07:40Z

globalscheduler/pkg/scheduler/scheduler.go

@@ -80,6 +80,13 @@ type ScheduleResult struct {
 	FeasibleSites  int // Number of feasible site on one stack scheduled
 }

+type PodSiteResourceAllocation struct {


The name is confusing. There is no allocation object at all

The struct name is changed to "PodSiteResource"

jshaofuturewei · 2021-05-18T21:11:53Z

globalscheduler/pkg/scheduler/eventhandlers.go

+					if pod, ok := t.Obj.(*v1.Pod); ok {
+						return failedToSchedule(pod) && responsibleForPod(pod, sched.SchedulerName)
+					}
+					utilruntime.HandleError(fmt.Errorf("unable to convert object %T to *v1.Pod in %T", obj, sched))


%T is the type of the object. A typo of %v?

%T is type of variable

Therefore it is equals to ".(type)", right? It is cache.DeletedFinalStateUnknown as we already know. Do we need to knew sched type as well?

This part is from HQ and I just don't change if possible.

I am not sure we can copy codes like that. We don't need sched type at all.

jshaofuturewei · 2021-05-18T21:12:00Z

globalscheduler/pkg/scheduler/eventhandlers.go

+					utilruntime.HandleError(fmt.Errorf("unable to convert object %T to *v1.Pod in %T", obj, sched))
+					return false
+				default:
+					utilruntime.HandleError(fmt.Errorf("unable to handle object in %T: %T", sched, obj))


%T is the type of the object. A typo of %v?

%T is type of variable

We don't need to know sched's type.

This code is from HQ.

jshaofuturewei · 2021-05-18T21:13:27Z

globalscheduler/pkg/scheduler/eventhandlers.go

+	verified = false
+	name := pod.Name
+	flavors := pod.Spec.VirtualMachine.Flavors
+	if pod.Name == "" || flavors == nil {


A pod can be a container pod without any flavor information

flavor check is removed

jshaofuturewei · 2021-05-18T21:21:50Z

globalscheduler/pkg/scheduler/framework/plugins/flavor/flavor.go

@@ -310,7 +310,7 @@ func (f *Flavor) Filter(ctx context.Context, cycleState *interfaces.CycleState,
 		var isCommonMatch, _ = isComFlavorMatch(flavorMap, siteCacheInfo)
 		var isSpotMatch, _ = isSpotFlavorMatch(spotFlavorMap, siteCacheInfo)
 		if isCommonMatch && isSpotMatch {
-			klog.Infof("*** isCommonMatch:%v, isSpotMatch:%v ", isCommonMatch, isSpotMatch)
+			klog.Infof("isCommonMatch:%v, isSpotMatch:%v ", isCommonMatch, isSpotMatch)


There is Infof with log level. I am not sure that scheduler starts with any log level info.

Not yet. I tried, but it didn't work yet.

In the rest of you codes, you add log level while this part still remains infof with infof without level. I am not sure if the other parts work.

infof() is again from HQ original source code. If we decide to change, it is not an issue. We can discuss in meeting today. Then we may change 800 files. I am not sure it is fine.

jshaofuturewei · 2021-05-18T21:23:00Z

globalscheduler/pkg/scheduler/sitecacheinfo/sitecache_info.go

@@ -643,20 +653,28 @@ func (n *SiteCacheInfo) updateSiteFlavor(resourceTypes []string, regionFlavors m
 	}
 }

-func (n *SiteCacheInfo) deductFlavor() {
+func (n *SiteCacheInfo) updateFlavorCount(deduct bool) {
+	var m int64


My big concern here is that map is not thread safe in golang. Do we need to add mutex here? We might add mutex in the codes to call the function. However, people might be not cautious enough to update map directly.

Mutex is used for this map

golang map does not come with mutex. If I call updateFlavorCount twice at the same time, does it work as expected?

updateFlavorCount() is private function and it is called by only updateSiteFlavor() and mutex is used there.

jshaofuturewei · 2021-05-18T21:26:26Z

globalscheduler/pkg/scheduler/framework/plugins/defaultbinder/default_binder.go

+	//siteSelectedInfo is type of SiteSelectorInfo at cycle_state.go
+	siteSelectedInfo, err := interfaces.GetSiteSelectorState(state, siteID)
+	if err != nil {
+		klog.Errorf("Gettng site selector state failed! err: %s", err)


It is changed to "Getting"

jshaofuturewei · 2021-05-18T21:31:39Z

globalscheduler/pkg/scheduler/framework/plugins/defaultbinder/default_binder.go

+	//siteSelectedInfo is type of SiteSelectorInfo at cycle_state.go
+	siteSelectedInfo, err := interfaces.GetSiteSelectorState(state, siteID)
+	if err != nil {
+		klog.Errorf("Gettng site selector state failed! err: %s", err)


err is not a string. Please use %v not %s

It is changed to %v

jshaofuturewei · 2021-05-18T21:33:49Z

globalscheduler/pkg/scheduler/framework/plugins/defaultbinder/default_binder.go

+		stack.Resources[i].FlavorIDSelected = flavorID
+		klog.V(4).Infof("GetFlavor - flavorID: %s, region: %s", flavorID, region)
+		flv, ok := cache.FlavorCache.GetFlavor(flavorID, region)
+		if !ok {


I am not very sure about the logic here. If the pod flavor does not exist in that region, do we need to return scheduling failure?

In that case, it assign the best site based on score.

kimeunju108 · 2021-05-25T16:30:33Z

tried reopen

kimeunju108

Tried to reopen..

kimeunju108 · 2021-05-25T17:01:42Z

Reopen
Duplicate of #298

kimeunju108 added 6 commits May 11, 2021 18:56

changed klog.Infof to klog.V(4).Infof

74f1776

updated code - removed else and added continue

207f229

Merge branch 'master' of https://github.com/CentaurusInfra/global-res…

86cde93

…ource-scheduler into feature/resource

implemented withdraw reserved resource for pod

c75e438

implemented resource revokation when vm creation failed

0c14cbe

implemented resource revokation when vm creation failed

bba5311

kimeunju108 requested review from pdgetrf, CoderKevinZhang and jshaofuturewei May 18, 2021 20:03

kimeunju108 added the ready for review label May 18, 2021

implemented resource revokation when vm creation failed

7773eeb

kimeunju108 added ready for review and removed ready for review labels May 18, 2021

jshaofuturewei reviewed May 18, 2021

View reviewed changes

kimeunju108 added 4 commits May 18, 2021 22:37

applied review

3b5a1c3

updated according to review

f1e1542

updated resource data structure

fe2ba14

applied review

ec4735b

kimeunju108 requested a review from jshaofuturewei May 19, 2021 04:19

kimeunju108 closed this May 19, 2021

kimeunju108 mentioned this pull request May 25, 2021

Revoke site resource when pod vm creation is failed #298

Closed

kimeunju108 commented May 25, 2021

View reviewed changes

kimeunju108 changed the title ~~Withdraw resource when vm creation failed~~ Withdraw resource when vm creation is failed May 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Withdraw resource when vm creation is failed #289

Withdraw resource when vm creation is failed #289

kimeunju108 commented May 18, 2021

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021

jshaofuturewei May 18, 2021 •

edited

Loading

kimeunju108 May 19, 2021

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021

jshaofuturewei May 18, 2021

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021 •

edited

Loading

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021

jshaofuturewei May 18, 2021 •

edited

Loading

kimeunju108 May 19, 2021

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021

jshaofuturewei May 18, 2021 •

edited

Loading

kimeunju108 May 18, 2021

jshaofuturewei May 18, 2021

kimeunju108 May 18, 2021

kimeunju108 commented May 25, 2021

kimeunju108 left a comment

kimeunju108 commented May 25, 2021 •

edited

Loading

Withdraw resource when vm creation is failed #289

Withdraw resource when vm creation is failed #289

Conversation

kimeunju108 commented May 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jshaofuturewei May 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kimeunju108 May 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jshaofuturewei May 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jshaofuturewei May 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kimeunju108 commented May 25, 2021

kimeunju108 left a comment

Choose a reason for hiding this comment

kimeunju108 commented May 25, 2021 • edited Loading

jshaofuturewei May 18, 2021 •

edited

Loading

kimeunju108 May 18, 2021 •

edited

Loading

jshaofuturewei May 18, 2021 •

edited

Loading

jshaofuturewei May 18, 2021 •

edited

Loading

kimeunju108 commented May 25, 2021 •

edited

Loading