Data mover backup node black list - Don't run in specified node #7036

Lyndon-Li · 2023-10-31T01:40:41Z

The data mover backup exposer generally has the ability to select node for running the data movement. This enables us to fulfill below user requirement:
Sometimes, if a node is running very critical workloads, users don't want data movements to run in the node

We can develop a black list of node mechanism, the nodes in the list will not host data movements.

Lyndon-Li · 2023-12-07T23:41:44Z

Another user case on this requirement #7185

Lyndon-Li · 2023-12-22T01:57:20Z

Another use case on this requirement #7243

balbiv · 2024-01-31T17:09:58Z

@Lyndon-Li thank you for adding this to the milestones. Because the helm deployment allows to set nodeSelectors for the node-agent, I created a dedicated node pool for it so data movement (CSI Snapshot) does have no impact on critical production applications. However, I figured that the pod responsible for mounting the backup PVC (running /velero-helper pause) is allowed to schedule on every node. If the backup pod can take over the nodeSelectors/tolerations from the node-agent daemonset, it will already be a big improvement. This also fixes that the backup pod cannot be started because the image pull policy is set to never.

Lyndon-Li · 2024-02-05T04:45:46Z

@balbiv
Thanks for the suggestion, as the current plan, we will create a dedicate loadAffinity configMap, instead of coupling node-agent pods scheduling to data mover backup pods scheduling, for below reasons:

The node-agent not only runs snapshot data movement, but for many other purpose, e.g., it also runs fs-backup/restore. So users may still not want to run data mover backups in all the nodes where node-agent pods resides in
The other plan is to automatically inherit node-agent pods' node-selection configurations to backupPods'. However, we are clear of some of the configurations, i.e., nodeSelectors, but we are not confident to some others. Specifically, we are not confident that simply inheriting all the configurations from node-agent would bring the same result for backupPod schedule as the node-agent pod schedule since we think the daemonset scheduler behaves differently from plain pod scheduler

Finally, we decided not to bring node-agent's node selection into consideration, if there are some of these configurations, users must apply them to loadAffinity configMap in an appropriate way, see the design PR #7383.

This is the initial plan, we may make changes according to comments of the PR, so for any ideas, you can comment in the PR.

Lyndon-Li added area/datamover Enhancement/User End-User Enhancement to Velero labels Oct 31, 2023

Lyndon-Li self-assigned this Oct 31, 2023

Lyndon-Li mentioned this issue Oct 31, 2023

Design for node-agent concurrency #6950

Merged

Lyndon-Li changed the title ~~Data mover backup black list - Don't run in specified node~~ Data mover backup node black list - Don't run in specified node Oct 31, 2023

Lyndon-Li mentioned this issue Nov 1, 2023

Data mover might not work with local PVs/PVCs CSIs #7044

Closed

Lyndon-Li mentioned this issue Dec 22, 2023

Need the ability to control what nodes the data movers are scheduled on #7243

Closed

Lyndon-Li added the 1.14-candidate label Dec 22, 2023

qiuming-best self-assigned this Jan 9, 2024

reasonerjt added Needs Design 2024 Q1 reviewed and removed 1.14-candidate labels Jan 24, 2024

reasonerjt added this to the v1.14 milestone Jan 24, 2024

Lyndon-Li mentioned this issue Feb 1, 2024

Add design for repository maintenance job #7375

Merged

3 tasks

Lyndon-Li unassigned qiuming-best Feb 2, 2024

Lyndon-Li mentioned this issue Feb 18, 2024

Issue 7036: node selection for data mover backup #7437

Merged

Lyndon-Li added the doc-change-required label Mar 29, 2024

Lyndon-Li closed this as completed Mar 29, 2024

Lyndon-Li mentioned this issue Apr 9, 2024

Data mover node selection doc #7640

Merged

danfengliu added the in-test-plan label Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data mover backup node black list - Don't run in specified node #7036

Data mover backup node black list - Don't run in specified node #7036

Lyndon-Li commented Oct 31, 2023

Lyndon-Li commented Dec 7, 2023

Lyndon-Li commented Dec 22, 2023

balbiv commented Jan 31, 2024

Lyndon-Li commented Feb 5, 2024 •

edited

Loading

Data mover backup node black list - Don't run in specified node #7036

Data mover backup node black list - Don't run in specified node #7036

Comments

Lyndon-Li commented Oct 31, 2023

Lyndon-Li commented Dec 7, 2023

Lyndon-Li commented Dec 22, 2023

balbiv commented Jan 31, 2024

Lyndon-Li commented Feb 5, 2024 • edited Loading

Lyndon-Li commented Feb 5, 2024 •

edited

Loading