-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable IPSec encryption of tunnel traffic #209
Conversation
Thanks for your PR. The following commands are available:
|
@wenyingd: I changed the L3 and L2 tables as we discussed offline. Please check. |
pkg/agent/openflow/client_test.go
Outdated
@@ -37,7 +37,7 @@ func installNodeFlows(ofClient Client, cacheKey string) (int, error) { | |||
gwMAC, _ := net.ParseMAC("AA:BB:CC:DD:EE:FF") | |||
IP, IPNet, _ := net.ParseCIDR("10.0.1.1/24") | |||
peerNodeIP := net.ParseIP("192.168.1.1") | |||
err := ofClient.InstallNodeFlows(hostName, gwMAC, IP, *IPNet, peerNodeIP) | |||
err := ofClient.InstallNodeFlows(hostName, gwMAC, IP, *IPNet, peerNodeIP, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you just want to not block original UT, you could use types.DefaultTunOFPort for parameter of ofport
, or the case might fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the test does not fail with 0. But I could change to types.DefaultTunOFPort.
2c95a45
to
f812877
Compare
c5ba8ec
to
a83fd04
Compare
Thanks for your PR. The following commands are available:
These commands can only be run by members of the vmware-tanzu organization. |
I still plan to add e2e tests before merging the PR, but the current comments are ready for review. |
57be6d7
to
c81ce7c
Compare
/test-e2e |
/test-e2e |
@@ -45,6 +48,8 @@ const ( | |||
maxRetryDelay = 300 * time.Second | |||
// Default number of workers processing a node change | |||
defaultWorkers = 4 | |||
|
|||
ovsExternalIDNodeName = "node-name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now ovs external ID constants are spread in files, so do ParseXXXInterfaceConfig. Do you think we could put them together and closer to interfaceCache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then where you suggest to put them (and buildXXXExternalIDs)? I do not think externalIDs belong to InterfaceStore, which is just a data store for interface data, and should not know anything about OVS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion, just felt a place having all external ID constants and the parsing functions might be convenient to know what data antrea persist in ovsdb and save class like AgentInitializer to depend on all packages that have parsing function. I didn't mean putting them in interfaceCache, but a separate package or file closer to interfaceCache as they are the things stored and read in interfaceCache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally do not like to create too many small pkgs, but do not have a good idea which existing pkg for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering a separate file in pkg/agent/interfacestore
. But I'm not sure if it makes sense in other perspectives like the interface scope you mentioned. I'm fine with keeping them as they are until we are clear that the other way were more appropriate.
// multiple calls to GenerateContainerInterfaceName with the same parameters | ||
// return the same value). The output has the length of interfaceNameLength(15). | ||
// The probability of collision should be neglectable. | ||
func GenerateContainerInterfaceName(podName string, podNamespace string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it cause existing interface names change? Would upgrading be affected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should first ask which format you like.
To handle upgrade, we could change to use Pod name + Namespace name to be the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would not supporting upgrade make users hard to use new features and get bugs in their clusters fixed? I think handling upgrade in code to consider previously used key might be complicated and hard to describe the compatibility in doc when we want to remove it. Do you think we can just use major release number to indicate compatibility like some software do? If it's 0.2.0 to 0.3.0, maybe we should avoid changes that break compatibility and leave it to major release change if it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends on how many deployments will use 0.2 and 0.1 and would expect upgrade without impact to existing Pods. If we want to make sure nothing breaks across versions, guess we need to add upgrade tests.
But for this change there is another way to handle it - if we believe not safe to use the generated interface name as the key we could instead use Pod + NS or Node name to be the key for InterfaceStore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree it's ok to ignore existing Pods at this stage.
If it's just changing key of InterfaceStore, I think it doesn't break compatibility? My concern was GenerateContainerInterfaceName is used in setupInterfaces for host veth name, not sure what would happen if it changes. Anyway, I agree it's fine to not handle upgrade case until we declare it and test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought over. Maybe let me revert the name change in this PR, and we could make a separate one to change it.
Do you support the new format I used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I restored the original format.
/test-e2e |
For each remote Node, create a separate tunnel port and interface on the OVS bridge, and set the remote IP and PSK options to the interface. Refactore InterfaceStore structs and funcs to add tunnel interface attributes and group InterfaceConfig fields by interface types. For packets to a tunnel, change the l3Forwarding table flow to set the output port to the tunnel interface, and resubmit to the l2ForwardingOutput table (bypassing the l2ForwardingCalcTable and the ingress rules tables). When IPSec is enabled, for every remote Node set the output port to the Node's tunnel interface, and also add a separate classification flow in the in port classification table for the tunnel.
Change Makefile to generate antrea-ipsec.yml that deploys Antrea with the IPSec encryption enabled. Add a config option to antrea-agent.conf for enabling IPSec encryption. Add yaml patches for creating a K8s Secret to save IKE PSK and adding an environment variable to the antrea-agent container to pass the Secret value.
/test-e2e |
/test-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a question about duplicate error logs.
if err := c.ovsBridgeClient.DeletePort(interfaceConfig.PortUUID); err != nil { | ||
klog.Errorf("Failed to delete OVS tunnel port %s for Node %s: %v", | ||
interfaceConfig.InterfaceName, nodeName, err) | ||
return fmt.Errorf("failed to delete OVS tunnel port for Node %s", nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not unified before, but want to discuss it here, finally these errors will be logged in processNextWorkItem
:
klog.Errorf("Error syncing Node %s, requeuing. Error: %v", key, err)
Do you think we can remove all klog.Errorf
to avoid duplicate error logs? I'm thinking maybe we should only use klog.Errorf
when the code is going to swallow the error and not pop up it, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I thought about it too. Just feel might be easier to debug if log closer to where the failure happens, and probably log with more details there.
For example, if we do not log here, should we always append the previously returned error to the new error and return to the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if not log here, previously returned error should be appended, like L246 (I think it's the most common way to return error in Go). That would guarantee no information will be missing. I raised it because there were only little difference between the error logs and the returned errors, while there should be no reason to not include details in returned errors too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Here I still feel easier to debug if log closer to the failure. But no strong opinion. If you believe we should always log in the upper layer, I could change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just personal opinion, not strong too. Feel feel to ignore.
/test-conformance |
/test-e2e |
/test-conformance |
/test-e2e |
1 similar comment
/test-e2e |
Add a test that redeloys Antrea with IPSec tunnel enabled, and tests connectivity between two Pods on two different Nodes.
/test-e2e |
/test-all |
Will add e2e tests in the same PR.