-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add packet capture feature #5821
Conversation
@luolanzone Here is the current PR for packet sampling. If you think more information was needed to help review this, let me know and I will update this ASAP. currently i'm actively working on the remaining tests and unitest/docs. Appreciate any inputs on the current implemention. |
f676355
to
72e7d8e
Compare
274ebd6
to
110f542
Compare
@luolanzone Hi lan, all updated. will make sure the golangci-fix pass and import order is correct in the following commits. Please have a review again |
0c1a0f1
to
0d665c3
Compare
8afa17e
to
f2d1f4c
Compare
if !c.packetSamplingSynced() { | ||
return errors.New("PacketSampling controller is not started") | ||
} | ||
oldPS, samplingState, shouldSkip, err := c.parsePacketIn(pktIn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's your thoughts about the name?
err = retry.RetryOnConflict(retry.DefaultRetry, func() error { | ||
ps, err := c.packetSamplingInformer.Lister().Get(oldPS.Name) | ||
if err != nil { | ||
return fmt.Errorf("get packetsampling failed: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw you used both packetsampling
and PacketSampling
in the log. Maybe stick to PacketSampling
in logs and comments.
pkg/agent/controller/packetsampling/packetsampling_controller.go
Outdated
Show resolved
Hide resolved
pkg/agent/controller/packetsampling/packetsampling_controller.go
Outdated
Show resolved
Hide resolved
b37b94c
to
6cea563
Compare
@hangyan please check and add e2e test as traceflow_test.go for this new feature. Thanks. |
9bf9515
to
e6f0ee1
Compare
pkg/agent/controller/packetsampling/packetsampling_controller.go
Outdated
Show resolved
Hide resolved
pkg/agent/controller/packetsampling/packetsampling_controller.go
Outdated
Show resolved
Hide resolved
pkg/agent/controller/packetsampling/packetsampling_controller.go
Outdated
Show resolved
Hide resolved
pkg/agent/controller/packetsampling/packetsampling_controller.go
Outdated
Show resolved
Hide resolved
55d376a
to
d67f023
Compare
packetDirectoryUnix = "/tmp/packetsampling/packets" | ||
packetDirectoryWindows = "C:\\packetsampling\\packets" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we not create directory under TEMP on Windows?
} | ||
|
||
packet.DestinationIP = net.ParseIP(dstSvc.Spec.ClusterIP) | ||
if !packet.IsIPv6 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not move this check to the logic in validation webhook when creating the CR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- updated to use golang function for temp dir.
- updated to use webhook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, please use webhook to do the validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated on 4.7
Accoding to Quan's suggestions. controller was removed, so i think the validation webhook can keep the old way? because it only do some simple validation on the fields, not calling k8s apis.
also @tnqn . Is it ok to keep the validation code in controller/<feature-name>/...
when there are no controller?
Is it possible that the pod/service object exist during the webhook validation but was deleted when the packetsapmpling was running? We still need to add error check in the code, right?
d67f023
to
fd2a318
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not finished yet
fd2a318
to
3a19e85
Compare
ddfe4fb
to
41cfb13
Compare
78d6212
to
1ee2069
Compare
07604b7
to
13d3d42
Compare
cc @jianjuns @antoninbas also continue help review this. (there some quan's comment left on the pipeline impl i'm current working on a fix. but the other parts has been updated according to the api change. |
f5daf42
to
0183504
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not finished yet.
// genEndpointMatchPackets generates match packets (with destination Endpoint's IP/port info) besides the normal match packet. | ||
// these match packets will help the pipeline to capture the pod -> svc traffic. | ||
// TODO: 1. support name based port name 2. dual-stack support | ||
func (c *Controller) genEndpointMatchPackets(pc *crdv1alpha1.PacketCapture) ([]binding.Packet, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree not to support Pod-to-Service at the moment, @jianjuns @antoninbas please input if you have a different view. Thanks.
ofPort = uint32(podInterfaces[0].OFPort) | ||
senderPacket = packet | ||
klog.V(2).InfoS("PacketCapture sender packet", "packet", *packet) | ||
if senderOnly && pc.Spec.Destination.Service != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you missed another case that senderOnly && pc.Spec.Destination.IP !=""
. I feel senderOnly is not aproper name, maybe emptyDestinationPod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was only for the service case to generated flows for endpoints. Since we are planning to remove this, will see after that settled
d10cfd0
to
3e56a16
Compare
} | ||
} | ||
|
||
c.runningPacketCapturesMutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple returns between c.runningPacketCapturesMutex.Lock() and c.runningPacketCapturesMutex.Unlock(), there will be dead locks if the function returned but Unlock() is not executed. Please wrap them into a function and use defer to run unlock:
func(){
c.runningPacketCapturesMutex.Lock()
defer c.runningPacketCapturesMutex.Unlock()
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pcState.shouldSyncPackets = len(podInterfaces) > 0 | ||
if !pcState.shouldSyncPackets { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean you only capture the packets on the destination Node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly. we first select a target pod, then use this pod to select the target node. The pod could be on the source or the dest, so the node also is.
var file afero.File | ||
filePath := uidToPath(string(pc.UID)) | ||
if _, err := os.Stat(filePath); err == nil { | ||
return fmt.Errorf("packet file already exists. this may be due to an unexpected termination") | ||
} else if os.IsNotExist(err) { | ||
file, err = defaultFS.Create(filePath) | ||
if err != nil { | ||
return fmt.Errorf("failed to create pcapng file: %w", err) | ||
} | ||
} else { | ||
return fmt.Errorf("couldn't check if the file exists: %w", err) | ||
} | ||
writer, err := pcapgo.NewNgWriter(file, layers.LinkTypeEthernet) | ||
if err != nil { | ||
return fmt.Errorf("couldn't initialize pcap writer: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably wrap into a function as getFileAndWriter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
packet.IsIPv6 = pc.Spec.Packet.IPFamily == v1.IPv6Protocol | ||
|
||
if receiverOnly { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
receiverOnly = pc.Spec.Destination.Pod != nil && pc.Spec.Source.Pod == nil
When it's receiverOnly, why the destination Pod in the spec is ignored in this case? here only assigned source IP and destinationMAC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have an ofPort matcher in the flow to serve this purpose in this case.
MatchDstMAC(packet.DestinationMAC).
Action().LoadToRegField(TargetOFPortField, ofPort).
} else if inputProto.StrVal == "UDP" { | ||
return protocol.Type_UDP | ||
} else { | ||
return protocol.Type_IPv6ICMP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not correct, I didn't see you limit the protocol scope in the manifest:
protocol:
x-kubernetes-int-or-string: true
I can see we set port
as x-kubernetes-int-or-string in most cases, we should limit the protocol to three string types: TCP/UDP/ICMP as you mentioned in the usage document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, waiting for the updates from the API MR comments.
cc @antoninbas to confirm this.
authConfig := getDefaultFileServerAuth() | ||
serverAuth, err := ftp.ParseBundleAuth(*authConfig, c.kubeClient) | ||
if err != nil { | ||
klog.ErrorS(err, "Failed to get authentication defined in the PacketCapture CR", "name", pc.Name, "authentication", authConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this log correct? it's not defined by the CR but a pre-defined secret, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/agent/supportbundlecollection/support_bundle_controller_test.go
Outdated
Show resolved
Hide resolved
@hangyan don't forget to resolve code conflicts in pkg/agent/supportbundlecollection/support_bundle_controller.go |
Signed-off-by: Hang Yan <yhang@vmware.com>
Signed-off-by: Hang Yan <yhang@vmware.com>
d6fd2d3
to
c36a376
Compare
Done. All fixed. Please take a look again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the changes for API related into a dedicated commit? and the details should be added to the commit message of data path change, this can help to simplify code review. Please do the same if you plan to have a new PR for the new data path implementation.
packet file from the sftp server(or from local antrea-agent pod) and analyze its contents with network diagnose tools | ||
like Wireshark or tcpdump. | ||
|
||
Currently we support max to `15` concurrrent PacketCapture session running at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we support max to `15` concurrrent PacketCapture session running at the same time. | |
Currently, we support a maximum of 15 concurrent sessions running at the same time. |
dstPort: 8080 # Destination port needs to be set when the protocol is TCP/UDP. | ||
status: | ||
numCapturedPackets: 5 | ||
# path format: <pod-name>:<filepath>. If this file was uploaded to the target file server, filename format is <uid>.pcapng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# path format: <pod-name>:<filepath>. If this file was uploaded to the target file server, filename format is <uid>.pcapng | |
# path format: <pod-name>:<filepath>. If this file was uploaded to the target file server, the filename format is <uid_of_packetcapture>.pcapng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say "The path of the packets file that is uploaded to the target file server, in the format of :. The PacketCapture CR UID is used as the file name".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CR above starts a new packet capture of TCP flows from a Pod named `frontend` | ||
to the port 8080 of a Pod named `backend` using TCP protocol. It will capture the first 5 packets | ||
that meet this criterion and upload them to the specified sftp server. Users can download the | ||
packet file from the sftp server(or from local antrea-agent pod) and analyze its contents with network diagnose tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should provide the path info in the antrea-agent pod, and it's not clear about "local antrea-agent pod", which antrea-agent Pod it will be? The Pod on the Node where the target Pod is running?
Closing in favor of #6756 |
This commit adds the packet capture feature to Antrea.
Fix #5443
Feature Introduction
Traceflow works well for network flow diagnose, but sometimes users may want to take a look into the raw packets in the flow. Currently, Antrea lacks the ability to capture raw packet in live traffic.
CRD Example
Notice
Vefied cases
known issue for now
Links: