-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add host-local IPAM GC on startup #5660
Conversation
/test-all |
29e53b5
to
3b3c873
Compare
3b3c873
to
4bb14d3
Compare
return fmt.Errorf("path '%s' is not a directory: %w", dir, err) | ||
} | ||
|
||
lk, err := NewFileLock(dataDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems https://github.com/containernetworking/plugins/blob/93a1b3d0e71b3a4b5c463de357876fe8d383012f/plugins/ipam/host-local/backend/disk/backend.go#L53 uses dir
instead of its parent dir dataDir
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a pretty bad typo, fixed
return fmt.Errorf("error when gathering IP filenames in the host-local data directory: %w", err) | ||
} | ||
|
||
allocatedIPs := sets.New[string]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it seems enough to use a bool to track if it ever fails to release an IP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"github.com/alexflint/go-filemutex" | ||
) | ||
|
||
// This code was copied from https://github.com/containernetworking/plugins/blob/v1.3.0/plugins/ipam/host-local/backend/disk/lock.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we import github.com/containernetworking/plugins/plugins/ipam/host-local/backend
given the package is already a dependency?
Line 22 in 4a52bca
github.com/containernetworking/plugins v1.1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we actually use it pretty extensively. Changed to an import.
During CNIServer reconciliation, we perform host-local IPAM garbage collection (GC) by comparing the set of IPs allocated to local Pods and the set of IPs currently reserved by the plugin. We release any IP reserved by the plugin that is not in-use by a local Pod. The purpose is to avoid leaking IP addresses when there is a bug in the container runtime, which has happened in the past. Two key design choices that were made: * We do not invoke CNI DEL to release IPs, instead we access the host-local data which is persisted on the Node, and modify it as needed. * We do not rely on the interface store (as persisted to OVSDB) to determine the set of IPs that may have been leaked. In case of an Antrea bug, it could be possible (although unlikely) for an IP to still be allocated by host-local but be missing from the interface store. Intead, we list all allocated IPs from the host-local data (an allocated IP corresponds to one disk file). This approach is essentially the same as our existing script: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh Fixes antrea-io#4326 Signed-off-by: Antonin Bas <abas@vmware.com>
37a66c4
to
23c2725
Compare
6b33047
to
17f523c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Antonin Bas <abas@vmware.com>
17f523c
to
4223995
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
During CNIServer reconciliation, we perform host-local IPAM garbage collection (GC) by comparing the set of IPs allocated to local Pods and the set of IPs currently reserved by the plugin. We release any IP reserved by the plugin that is not in-use by a local Pod. The purpose is to avoid leaking IP addresses when there is a bug in the container runtime, which has happened in the past.
Two key design choices that were made:
This approach is essentially the same as our existing script: https://github.com/antrea-io/antrea/blob/main/hack/gc-host-local.sh
Fixes #4326