Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The best-effort pod should be immediately dispatched in allocate function of session , without considering the gang constraint #646

Closed
sivanzcw opened this issue Dec 26, 2019 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@sivanzcw
Copy link
Contributor

/kind bug

Considering the gang constraint in the backfill action may cause the job with gang configuration and contains the best-effort pod to be unscheduled.

In backfill action, if task predicated succeed, the scheduler will call ssn.Allocate function to dispatch pod

for _, node := range ssn.Nodes {
	// TODO (k82cn): predicates did not consider pod number for now, there'll
	// be ping-pong case here.
	if err := ssn.PredicateFn(task, node); err != nil {
		klog.V(3).Infof("Predicates failed for task <%s/%s> on node <%s>: %v",
			task.Namespace, task.Name, node.Name, err)
		fe.SetNodeError(node.Name, err)
		continue
	}

	klog.V(3).Infof("Binding Task <%v/%v> to node <%v>", task.Namespace, task.Name, node.Name)
	if err := ssn.Allocate(task, node.Name); err != nil {
		klog.Errorf("Failed to bind Task %v on %v in Session %v", task.UID, node.Name, ssn.UID)
		fe.SetNodeError(node.Name, err)
		continue
	}

	allocated = true
	break
}

In ssn.Allocate function, only when job of task is ready, can the task be dispatched

if ssn.JobReady(job) {
	for _, task := range job.TaskStatusIndex[api.Allocated] {
		if err := ssn.dispatch(task); err != nil {
			klog.Errorf("Failed to dispatch task <%v/%v>: %v",
				task.Namespace, task.Name, err)
			return err
		}
	}
}

If others pods under the job are not schedulerd, the best-effort pod can not be scheduled. If all pods under the job are best-effort, the job will not be scheduled.

In addition, for best-effort pods, there are no resource requests, it it not necessary to consider the gang constraint.

@volcano-sh-bot volcano-sh-bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 26, 2019
@sivanzcw sivanzcw changed the title The best-effiort pod should be immediately dispatched in allocate function of session , without considering the gang constraint The best-effort pod should be immediately dispatched in allocate function of session , without considering the gang constraint Dec 26, 2019
@carmark
Copy link
Contributor

carmark commented Dec 26, 2019

This may make the MPIJob pending when the launcher does not config resource requests/limits.

@k82cn
Copy link
Member

k82cn commented Dec 26, 2019

/cc @carmark

@sivanzcw
Copy link
Contributor Author

sivanzcw commented Jan 6, 2020

closed by #647

@sivanzcw sivanzcw closed this as completed Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants