Don't put down LAG interface when it starts in WR mode #2257

pavel-shirshov · 2018-11-14T21:44:47Z

- What I did
I changed teamd logic carrier logic for LAG interface in WR mode:

- How I did it
When teamd starts it reads a carrier state for LAG interface. If the state is down, we don't need any special logic here, so teamd will work with the carrier in normal mode. if the state is up, we enable wr_carrier mode. In this mode we will prevent the LAG interface going down. We exits from this mode when either teamd logic decides it's time to raise the interface up, or we have a timer expired.
Current timer value is 3 second. We introduce the timer to be sure we will exit from the mode eventually.

- How to verify it

build code
install it into your DUT
pkill teamd
enable WR mode
restart teamd container. bgpd session must be up

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

zhenggen-xu · 2018-11-14T23:20:28Z

src/libteam/0005-libteam-Add-warm_reboot-mode.patch

+
+-	/* initialize carrier control */
+-	err = team_carrier_set(ctx->th, false);
+	err = team_carrier_set(ctx->th, ctx->warm_start);


In case of warm start, should we keep the carrier state as is? I,E, don't set the carrier to kernel?

when we start teamd after system restart, the carrier is down. We need it to be up (if the carrier was up before reboot).

I was thinking more about teamd warm-restart. If the carrier was down before warm-restart of teamd docker, should we unconditionally set carrier to true after warm restart?

zhenggen-xu · 2018-11-14T23:21:41Z

src/libteam/0005-libteam-Add-warm_reboot-mode.patch

+ 	}
+-
+-	lacp->carrier_up = false;
+	lacp->carrier_up = ctx->warm_start;


We probably could read the carrier state back and save it to "lacp->carrier_up"?

When we don't do system reboot your approach will work. When we do the system reboot the carrier state would be 'down' on reading

Same as above.
If we want to support docker level warm restart, sounds like we may need save the carrier state before warm restart or system warm reboot, and restore them afterwards?

zhenggen-xu · 2018-11-14T23:27:30Z

src/libteam/0005-libteam-Add-warm_reboot-mode.patch

+ 			return lacp_set_carrier(lacp, true);
+		}
+ 	}
+	if (lacp->ctx->warm_start && !lacp->ctx->warm_start_started)


Any reason we need this two lines here?

We want to put the LAG interface up only once, when teamd started. After teamd calculate the interface is allowed to be up in legit way, we disable our shortcut here. So the shortcut mode should work only once, on the start of teamd.

Thanks for your explanation. I probably missed something here, Isn't "lacp_carrier_init" to set the carrier state after teamd started and this function (lacp_update_carrier) is only hit if we do port update etc later? "lacp_carrier_init" should handle the different reboot modes, but I am not sure if we need do something special in this function for warm reboot/restart case.

… mode

pavel-shirshov · 2018-11-15T20:12:18Z

Address PR comments

zhenggen-xu · 2018-11-15T21:41:18Z

src/libteam/0005-libteam-Add-warm_reboot-mode.patch

+ 			return lacp_set_carrier(lacp, true);
+		}
+ 	}
+	if (lacp->ctx->warm_start && !lacp->ctx->warm_start_started)


Thanks for the change, even if teamd was started with warmboot mode, we should still set the carrier according to the port enabled state (e,g, number of enabled port less than the min_ports , we should put it down. etc.) during the run time to keep the state right? Do you think lacp_carrier_init should be able to restore to whatever state before warmboot and we don't need do anything special here?

When we start teamd in WR mode we don't have any information about current desired state of the LAG interface.
If the box wasn't restarted, we could read the current state of teamd, and consider that no interfaces wasn't put down during transient period. Then teamd will read saved LACP PDUs, and restore previous state of itself. If some state changed since stop of previous instance of teamd, the new instance of teamd will catch it up eventually (up to 90 seconds in the worst case)
If the box was restarted, we must create LAG interface and change it state to the saved state ASAP. This patch doesn't have this behavior. Current patch doesn't change state the LAG interface

Because, if the previous state was down, we don't put the LAG interface up, only because we're in WR mode.

But if the previous state was up, we put the interface up as soon as we get into lacp_update_carrier function, which is fast. And after the interface is up after we have enough links in the group we put teamd behavior to the normal mode.

What is bad in my patch. If the previous state of LAG was down, and after start we have LAG in down state, it'd be up as soon as we add one interface to a group, so no min_links check. I'm thinking currently how I can fix that.

pavel-shirshov · 2018-11-19T22:02:20Z

Introduced the WR carrier logic. Updated PR overview

Includes below commits ``` 0d5e68f5a [GCU] Ignore bgpraw table in GCU operation (#2628) 22757b1f3 Add interface link-training command into the CLI doc (#2257) f4f857e10 [GCU] Ignore bgpraw in GCU applier (#2623) b5ac60036 [muxcable][config] Add support to enable/disable ceasing to be an advertisement interface when `radv` service is stopped (#2622) 981f9531e [chassis][voq] Add "show fabric reachability" command. (#2528) fba87f43f Revert (#2599) d6d7ab37f [warm-reboot] Use kexec_file_load instead of kexec_load when available (#2608) db4683d40 fix show techsupport error (#2597) 3d8e9c62d [GCU] Prohibit removal of PFC_WD POLL_INTERVAL field (#2545) 163e766cc [techsupport] include APPL_STATE_DB dump (#2607) 8703773eb YANG Validation for ConfigDB Updates: RADIUS_SERVER (#2604) c2d746d4f Remove TODO comment which is no longer relevant (#2600) f09da9983 [show] Add bgpraw to show run all (#2537) 39ac5641b Extend fast-reboot STATE_DB entry timer (#2577) ```

Don't put down LAG interface when it starts in WR mode

ce9f9f5

pavel-shirshov added the Bug 🐛 label Nov 14, 2018

pavel-shirshov self-assigned this Nov 14, 2018

pavel-shirshov requested review from lguohan, stcheng and yxieca November 14, 2018 21:44

zhenggen-xu reviewed Nov 14, 2018

View reviewed changes

lguohan approved these changes Nov 15, 2018

View reviewed changes

Change logic. Don't touch carrier in WR mode. Until it could be in UP…

e994148

… mode

zhenggen-xu reviewed Nov 15, 2018

View reviewed changes

Change control plane restore logic in WR mode

0d30a80

pavel-shirshov merged commit f6f8880 into master Nov 20, 2018

pavel-shirshov deleted the pavelsh/fix_wr_start branch November 20, 2018 02:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't put down LAG interface when it starts in WR mode #2257

Don't put down LAG interface when it starts in WR mode #2257

pavel-shirshov commented Nov 14, 2018 •

edited

Loading

zhenggen-xu Nov 14, 2018

pavel-shirshov Nov 15, 2018

zhenggen-xu Nov 15, 2018

zhenggen-xu Nov 14, 2018

pavel-shirshov Nov 15, 2018

zhenggen-xu Nov 15, 2018

zhenggen-xu Nov 14, 2018

pavel-shirshov Nov 15, 2018

zhenggen-xu Nov 15, 2018

pavel-shirshov commented Nov 15, 2018

zhenggen-xu Nov 15, 2018

pavel-shirshov Nov 15, 2018

pavel-shirshov Nov 16, 2018

pavel-shirshov commented Nov 19, 2018

Don't put down LAG interface when it starts in WR mode #2257

Don't put down LAG interface when it starts in WR mode #2257

Conversation

pavel-shirshov commented Nov 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavel-shirshov commented Nov 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavel-shirshov commented Nov 19, 2018

pavel-shirshov commented Nov 14, 2018 •

edited

Loading