Handle subnet lease getting expired #29

eyakubovich · 2014-08-29T20:56:37Z

Although flannel will start renewing the lease an hour prior to expiration, it could still get lost: e.g. VM getting suspended. Flannel should try to get the same subnet assignment if it's still available but fall back to a new lease and signal the fact.

macb · 2016-03-07T20:18:46Z

Is there any work under way for this? It'd be incredibly useful as right now if a machine loses a lease and gets a new one it renders any containers on the machine with no network connectivity.

tomdee · 2017-04-27T23:15:56Z

One implementation idea for this is in #610

tomdee · 2017-04-27T23:18:38Z

Also see #520 for some good questions about how flannel handles this at the moment.

tomdee · 2017-04-27T23:18:56Z

When fixing this, we should make sure this failure scenario is discussed clearly in the docs.

rosenhouse · 2017-04-28T00:02:41Z

FWIW, the system design that we've converged on for Cloud Foundry is that hosts are preferentially assigned their prior lease, even if it "expired." And if a new host appears, it is assigned a lease in the following priority order:

prefer subnets that have never been given out before, or subnets which were explicitly relinquished by a cleanly-terminating host.
if none of those exist, only then does the new host take over an expired lease, and in that case it chooses the oldest such lease.

This is meant to minimize the probability that a lease is "stolen" from a live, but partitioned, container host. But if that does occur, once the partition heals and the "victim" host re-connects, it will discover that its lease is no longer valid. In this case, the victim host falls into a special, noisy failure mode which will (1) prevent any new workloads from being scheduled and (2) trigger the orchestration system to evacuate any existing workloads. Once the evacuation is complete, the host will clean up any leftover networking state (e.g. remove the VXLAN device), acquire a new lease for itself and begin accepting new workloads.

We think this is the right plan. Feedback welcome.

Added feature to allow flannel to restart in case of etcd failures and still keep the same subnet address for the hosts. Fixes flannel-io#610 flannel-io#29

tomdee · 2017-07-12T00:32:41Z

This is now fixed in v0.8.0

eyakubovich added the enhancement label Aug 29, 2014

eyakubovich added this to the 1.0 milestone Aug 21, 2015

This was referenced Apr 27, 2017

Feature idea: on startup, read subnet.env and attempt to acquire that lease #610

Closed

What should flannel do when it loses a lease? #520

Closed

tomdee added the area/documentation label Apr 27, 2017

mgleung mentioned this issue Jun 19, 2017

flannel reads from created subnet.env file on startup #752

Merged

tomdee closed this as completed Jul 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle subnet lease getting expired #29

Handle subnet lease getting expired #29

eyakubovich commented Aug 29, 2014

macb commented Mar 7, 2016

tomdee commented Apr 27, 2017 •

edited

Loading

tomdee commented Apr 27, 2017

tomdee commented Apr 27, 2017

rosenhouse commented Apr 28, 2017

tomdee commented Jul 12, 2017

Handle subnet lease getting expired #29

Handle subnet lease getting expired #29

Comments

eyakubovich commented Aug 29, 2014

macb commented Mar 7, 2016

tomdee commented Apr 27, 2017 • edited Loading

tomdee commented Apr 27, 2017

tomdee commented Apr 27, 2017

rosenhouse commented Apr 28, 2017

tomdee commented Jul 12, 2017

tomdee commented Apr 27, 2017 •

edited

Loading