Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't put down LAG interface when it starts in WR mode #2257

Merged
merged 3 commits into from
Nov 20, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 46 additions & 13 deletions src/libteam/0005-libteam-Add-warm_reboot-mode.patch
Original file line number Diff line number Diff line change
Expand Up @@ -125,20 +125,21 @@ index c987333..53aec1d 100644
daemon_retval_send(errno);
err = -errno;
diff --git a/teamd/teamd.h b/teamd/teamd.h
index ef0fb1c..b1b6dfe 100644
index ef0fb1c..bd88aea 100644
--- a/teamd/teamd.h
+++ b/teamd/teamd.h
@@ -125,6 +125,9 @@ struct teamd_context {
@@ -125,6 +125,10 @@ struct teamd_context {
char * hwaddr;
uint32_t hwaddr_len;
bool hwaddr_explicit;
+ bool warm_start;
+ bool warm_start_started;
+ bool keep_ports;
+ char * lacp_directory;
struct {
struct list_item callback_list;
int ctrl_pipe_r;
@@ -191,12 +194,15 @@ struct teamd_event_watch_ops {
@@ -191,12 +195,15 @@ struct teamd_event_watch_ops {
struct teamd_port *tdport, void *priv);
void (*refresh)(struct teamd_context *ctx,
struct teamd_port *tdport, void *priv);
Expand Down Expand Up @@ -179,7 +180,7 @@ index 5c2ef56..50e5a08 100644
struct teamd_port *tdport)
{
diff --git a/teamd/teamd_runner_lacp.c b/teamd/teamd_runner_lacp.c
index 81324de..519f5e2 100644
index 81324de..6a52fcb 100644
--- a/teamd/teamd_runner_lacp.c
+++ b/teamd/teamd_runner_lacp.c
@@ -174,6 +174,8 @@ struct lacp_port {
Expand All @@ -191,7 +192,22 @@ index 81324de..519f5e2 100644
struct {
uint32_t speed;
uint8_t duplex;
@@ -994,6 +996,13 @@ static int lacp_port_set_state(struct lacp_port *lacp_port,
@@ -496,9 +498,13 @@ static int lacp_update_carrier(struct lacp *lacp)
err = teamd_port_enabled(lacp->ctx, tdport, &state);
if (err)
return err;
- if (state && ++ports_enabled >= lacp->cfg.min_ports)
+ if (state && ++ports_enabled >= lacp->cfg.min_ports) {
+ lacp->ctx->warm_start_started = true;
return lacp_set_carrier(lacp, true);
+ }
}
+ if (lacp->ctx->warm_start && !lacp->ctx->warm_start_started)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we need this two lines here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to put the LAG interface up only once, when teamd started. After teamd calculate the interface is allowed to be up in legit way, we disable our shortcut here. So the shortcut mode should work only once, on the start of teamd.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation. I probably missed something here, Isn't "lacp_carrier_init" to set the carrier state after teamd started and this function (lacp_update_carrier) is only hit if we do port update etc later? "lacp_carrier_init" should handle the different reboot modes, but I am not sure if we need do something special in this function for warm reboot/restart case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change, even if teamd was started with warmboot mode, we should still set the carrier according to the port enabled state (e,g, number of enabled port less than the min_ports , we should put it down. etc.) during the run time to keep the state right? Do you think lacp_carrier_init should be able to restore to whatever state before warmboot and we don't need do anything special here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we start teamd in WR mode we don't have any information about current desired state of the LAG interface.
If the box wasn't restarted, we could read the current state of teamd, and consider that no interfaces wasn't put down during transient period. Then teamd will read saved LACP PDUs, and restore previous state of itself. If some state changed since stop of previous instance of teamd, the new instance of teamd will catch it up eventually (up to 90 seconds in the worst case)
If the box was restarted, we must create LAG interface and change it state to the saved state ASAP. This patch doesn't have this behavior. Current patch doesn't change state the LAG interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, if the previous state was down, we don't put the LAG interface up, only because we're in WR mode.

But if the previous state was up, we put the interface up as soon as we get into lacp_update_carrier function, which is fast. And after the interface is up after we have enough links in the group we put teamd behavior to the normal mode.

What is bad in my patch. If the previous state of LAG was down, and after start we have LAG in down state, it'd be up as soon as we add one interface to a group, so no min_links check. I'm thinking currently how I can fix that.

+ return lacp_set_carrier(lacp, true);

return lacp_set_carrier(lacp, false);
}
@@ -994,6 +1000,13 @@ static int lacp_port_set_state(struct lacp_port *lacp_port,
break;
}

Expand All @@ -205,7 +221,7 @@ index 81324de..519f5e2 100644
teamd_log_info("%s: Changed port state: \"%s\" -> \"%s\"",
lacp_port->tdport->ifname,
lacp_port_state_name[lacp_port->state],
@@ -1084,26 +1093,23 @@ static int lacpdu_send(struct lacp_port *lacp_port)
@@ -1084,26 +1097,23 @@ static int lacpdu_send(struct lacp_port *lacp_port)
return err;
}

Expand Down Expand Up @@ -240,7 +256,7 @@ index 81324de..519f5e2 100644
err = lacp_port_partner_update(lacp_port);
if (err)
return err;
@@ -1118,7 +1124,7 @@ static int lacpdu_recv(struct lacp_port *lacp_port)
@@ -1118,7 +1128,7 @@ static int lacpdu_recv(struct lacp_port *lacp_port)

/* Check if the other side has correct info about us */
if (!lacp_port->periodic_on &&
Expand All @@ -249,7 +265,7 @@ index 81324de..519f5e2 100644
sizeof(struct lacpdu_info))) {
err = lacpdu_send(lacp_port);
if (err)
@@ -1133,6 +1139,59 @@ static int lacpdu_recv(struct lacp_port *lacp_port)
@@ -1133,6 +1143,59 @@ static int lacpdu_recv(struct lacp_port *lacp_port)
return 0;
}

Expand Down Expand Up @@ -309,7 +325,7 @@ index 81324de..519f5e2 100644
static int lacp_callback_timeout(struct teamd_context *ctx, int events,
void *priv)
{
@@ -1299,6 +1358,12 @@ static int lacp_port_added(struct teamd_context *ctx,
@@ -1299,6 +1362,12 @@ static int lacp_port_added(struct teamd_context *ctx,
lacp_port_actor_init(lacp_port);
lacp_port_link_update(lacp_port);

Expand All @@ -322,7 +338,7 @@ index 81324de..519f5e2 100644
teamd_loop_callback_enable(ctx, LACP_SOCKET_CB_NAME, lacp_port);
return 0;

@@ -1321,7 +1386,11 @@ static void lacp_port_removed(struct teamd_context *ctx,
@@ -1321,7 +1390,11 @@ static void lacp_port_removed(struct teamd_context *ctx,
{
struct lacp_port *lacp_port = priv;

Expand All @@ -335,7 +351,7 @@ index 81324de..519f5e2 100644
teamd_loop_callback_del(ctx, LACP_TIMEOUT_CB_NAME, lacp_port);
teamd_loop_callback_del(ctx, LACP_PERIODIC_CB_NAME, lacp_port);
teamd_loop_callback_del(ctx, LACP_SOCKET_CB_NAME, lacp_port);
@@ -1413,6 +1482,31 @@ static void lacp_event_watch_refresh(struct teamd_context *ctx, struct teamd_por
@@ -1413,6 +1486,31 @@ static void lacp_event_watch_refresh(struct teamd_context *ctx, struct teamd_por
(void) lacpdu_send(lacp_port);
}

Expand Down Expand Up @@ -367,15 +383,32 @@ index 81324de..519f5e2 100644
static const struct teamd_event_watch_ops lacp_event_watch_ops = {
.hwaddr_changed = lacp_event_watch_hwaddr_changed,
.port_added = lacp_event_watch_port_added,
@@ -1420,6 +1514,7 @@ static const struct teamd_event_watch_ops lacp_event_watch_ops = {
@@ -1420,20 +1518,19 @@ static const struct teamd_event_watch_ops lacp_event_watch_ops = {
.port_changed = lacp_event_watch_port_changed,
.admin_state_changed = lacp_event_watch_admin_state_changed,
.refresh = lacp_event_watch_refresh,
+ .port_flush_data = lacp_event_watch_port_flush_data,
};

static int lacp_carrier_init(struct teamd_context *ctx, struct lacp *lacp)
@@ -1946,7 +2041,7 @@ static void lacp_fini(struct teamd_context *ctx, void *priv)
{
int err;

- /* initialize carrier control */
- err = team_carrier_set(ctx->th, false);
+ err = team_carrier_set(ctx->th, ctx->warm_start);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of warm start, should we keep the carrier state as is? I,E, don't set the carrier to kernel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we start teamd after system restart, the carrier is down. We need it to be up (if the carrier was up before reboot).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more about teamd warm-restart. If the carrier was down before warm-restart of teamd docker, should we unconditionally set carrier to true after warm restart?

if (err && err != -EOPNOTSUPP) {
- teamd_log_err("Failed to set carrier down.");
+ teamd_log_err("Failed to set carrier %s.", ctx->warm_start ? "up": "down");
return err;
}
-
- lacp->carrier_up = false;
+ lacp->carrier_up = ctx->warm_start;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could read the carrier state back and save it to "lacp->carrier_up"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we don't do system reboot your approach will work. When we do the system reboot the carrier state would be 'down' on reading

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.
If we want to support docker level warm restart, sounds like we may need save the carrier state before warm restart or system warm reboot, and restore them afterwards?


return 0;
}
@@ -1946,7 +2043,7 @@ static void lacp_fini(struct teamd_context *ctx, void *priv)
teamd_state_val_unregister(ctx, &lacp_state_vg, lacp);
teamd_balancer_fini(lacp->tb);
teamd_event_watch_unregister(ctx, &lacp_event_watch_ops, lacp);
Expand Down